Re: [BUG] Corrupted SCHED_DEADLINE bandwidth with cpusets

From: Steven Rostedt
Date: Thu Feb 04 2016 - 12:31:12 EST


On Thu, 4 Feb 2016 16:30:49 +0000
Juri Lelli <juri.lelli@xxxxxxx> wrote:

> I've actually changed a bit this approach, and things seem better here.
> Could you please give this a try? (You can also fetch the same branch).

It appears to fix the one issue I pointed out, but it doesn't fix the
issue with cpusets.

# burn&
# TASK=$!
# schedtool -E -t 2000000:20000000 $TASK
# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[1]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[2]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[3]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[4]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[5]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[6]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[7]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857

# mkdir /sys/fs/cgroup/cpuset/my_cpuset
# echo 1 > /sys/fs/cgroup/cpuset/my_cpuset/cpuset.cpus
# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 209714
dl_rq[1]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 209714
dl_rq[2]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 209714
dl_rq[3]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 209714
dl_rq[4]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 209714
dl_rq[5]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 209714
dl_rq[6]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 209714
dl_rq[7]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 209714

It appears to add double the bandwidth.

# kill $TASK
# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[1]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[2]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[3]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[4]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[5]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[6]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[7]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857

Now we have used bandwidth with nothing running.

# rmdir /sys/fs/cgroup/cpuset/my_cpuset
# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[1]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[2]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[3]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[4]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[5]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[6]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857
dl_rq[7]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 104857

And now that bandwidth is leaked, but it seems we can get it back with
the old sched_load_balance trick.

# echo 0 > /sys/fs/cgroup/cpuset/cpuset.sched_load_balance
# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[1]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[2]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[3]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[4]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[5]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[6]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[7]:
.dl_nr_running : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0


-- Steve