Re: [RFC] [PATCH v2 0/8] Provide cgroup isolation for bufferedwrites.

From: Vivek Goyal
Date: Tue Mar 22 2011 - 21:28:22 EST

Next message: Guan Xuetao: "RE: [PATCH] ARM: Use asm-generic/sizes.h"
Previous message: KOSAKI Motohiro: "Re: [PATCH 3/5] oom: create oom autogroup"
In reply to: Justin TerAvest: "[PATCH v2 6/8] cfq: add per cgroup writeout done by flusher stat"
Next in thread: Justin TerAvest: "Re: [RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered writes."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Mar 22, 2011 at 04:08:47PM -0700, Justin TerAvest wrote:

[..]
> ===================================== Isolation experiment results
>
> For isolation testing, we run a test that's available at:
> git://google3-2.osuosl.org/tests/blkcgroup.git
>
> It creates containers, runs workloads, and checks to see how well we meet
> isolation targets. For the purposes of this patchset, I only ran
> tests among buffered writers.
>
> Before patches
> ==============
> 10:32:06 INFO experiment 0 achieved DTFs: 666, 333
> 10:32:06 INFO experiment 0 FAILED: max observed error is 167, allowed is 150
> 10:32:51 INFO experiment 1 achieved DTFs: 647, 352
> 10:32:51 INFO experiment 1 FAILED: max observed error is 253, allowed is 150
> 10:33:35 INFO experiment 2 achieved DTFs: 298, 701
> 10:33:35 INFO experiment 2 FAILED: max observed error is 199, allowed is 150
> 10:34:19 INFO experiment 3 achieved DTFs: 445, 277, 277
> 10:34:19 INFO experiment 3 FAILED: max observed error is 155, allowed is 150
> 10:35:05 INFO experiment 4 achieved DTFs: 418, 104, 261, 215
> 10:35:05 INFO experiment 4 FAILED: max observed error is 232, allowed is 150
> 10:35:53 INFO experiment 5 achieved DTFs: 213, 136, 68, 102, 170, 136, 170
> 10:35:53 INFO experiment 5 PASSED: max observed error is 73, allowed is 150
> 10:36:04 INFO -----ran 6 experiments, 1 passed, 5 failed
>
> After patches
> =============
> 11:05:22 INFO experiment 0 achieved DTFs: 501, 498
> 11:05:22 INFO experiment 0 PASSED: max observed error is 2, allowed is 150
> 11:06:07 INFO experiment 1 achieved DTFs: 874, 125
> 11:06:07 INFO experiment 1 PASSED: max observed error is 26, allowed is 150
> 11:06:53 INFO experiment 2 achieved DTFs: 121, 878
> 11:06:53 INFO experiment 2 PASSED: max observed error is 22, allowed is 150
> 11:07:46 INFO experiment 3 achieved DTFs: 589, 205, 204
> 11:07:46 INFO experiment 3 PASSED: max observed error is 11, allowed is 150
> 11:08:34 INFO experiment 4 achieved DTFs: 616, 109, 109, 163
> 11:08:34 INFO experiment 4 PASSED: max observed error is 34, allowed is 150
> 11:09:29 INFO experiment 5 achieved DTFs: 139, 139, 139, 139, 140, 141, 160
> 11:09:29 INFO experiment 5 PASSED: max observed error is 1, allowed is 150
> 11:09:46 INFO -----ran 6 experiments, 6 passed, 0 failed
>
> Summary
> =======
> Isolation between buffered writers is clearly better with this patch.

Can you pleae explain what is this test doing. All I am seeing is passed
and failed and really don't understand what the test is doing.

Can you run say simple 4 dd buffered writers in 4 cgroups with weights
100, 200, 300 and 400 and see if you get better isolation.

Secondly can you also please explain that how does it work. Without
making writeback cgroup aware, there are no gurantees that higher
weight cgroup will get more IO done.

>
>
> =============================== Read latency results
> To test read latency, I created two containers:
> - One called "readers", with weight 900
> - One called "writers", with weight 100
>
> I ran this fio workload in "readers":
> [global]
> directory=/mnt/iostestmnt/fio
> runtime=30
> time_based=1
> group_reporting=1
> exec_prerun='echo 3 > /proc/sys/vm/drop_caches'
> cgroup_nodelete=1
> bs=4K
> size=512M
>
> [iostest-read]
> description="reader"
> numjobs=16
> rw=randread
> new_group=1
>
>
> ....and this fio workload in "writers"
> [global]
> directory=/mnt/iostestmnt/fio
> runtime=30
> time_based=1
> group_reporting=1
> exec_prerun='echo 3 > /proc/sys/vm/drop_caches'
> cgroup_nodelete=1
> bs=4K
> size=512M
>
> [iostest-write]
> description="writer"
> cgroup=writers
> numjobs=3
> rw=write
> new_group=1
>
>
>
> I've pasted the results from the "read" workload inline.
>
> Before patches
> ==============
> Starting 16 processes
>
> Jobs: 14 (f=14): [_rrrrrr_rrrrrrrr] [36.2% done] [352K/0K /s] [86 /0 iops] [eta 01m:00s]·············
> iostest-read: (groupid=0, jobs=16): err= 0: pid=20606
> Description : ["reader"]
> read : io=13532KB, bw=455814 B/s, iops=111 , runt= 30400msec
> clat (usec): min=2190 , max=30399K, avg=30395175.13, stdev= 0.20
> lat (usec): min=2190 , max=30399K, avg=30395177.07, stdev= 0.20
> bw (KB/s) : min= 0, max= 260, per=0.00%, avg= 0.00, stdev= 0.00
> cpu : usr=0.00%, sys=0.03%, ctx=3691, majf=2, minf=468
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=3383/0/0, short=0/0/0
>
> lat (msec): 4=0.03%, 10=2.66%, 20=74.84%, 50=21.90%, 100=0.09%
> lat (msec): 250=0.06%, >=2000=0.41%
>
> Run status group 0 (all jobs):
> READ: io=13532KB, aggrb=445KB/s, minb=455KB/s, maxb=455KB/s, mint=30400msec, maxt=30400msec
>
> Disk stats (read/write):
> sdb: ios=3744/18, merge=0/16, ticks=542713/1675, in_queue=550714, util=99.15%
>
>
>
> After patches
> =============
> tarting 16 processes
> Jobs: 16 (f=16): [rrrrrrrrrrrrrrrr] [100.0% done] [557K/0K /s] [136 /0 iops] [eta 00m:00s]
> iostest-read: (groupid=0, jobs=16): err= 0: pid=14183
> Description : ["reader"]
> read : io=14940KB, bw=506105 B/s, iops=123 , runt= 30228msec
> clat (msec): min=2 , max=29866 , avg=463.42, stdev=101.84
> lat (msec): min=2 , max=29866 , avg=463.42, stdev=101.84
> bw (KB/s) : min= 0, max= 198, per=31.69%, avg=156.52, stdev=17.83
> cpu : usr=0.01%, sys=0.03%, ctx=4274, majf=2, minf=464
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=3735/0/0, short=0/0/0
>
> lat (msec): 4=0.05%, 10=0.32%, 20=32.99%, 50=64.61%, 100=1.26%
> lat (msec): 250=0.11%, 500=0.11%, 750=0.16%, 1000=0.05%, >=2000=0.35%
>
> Run status group 0 (all jobs):
> READ: io=14940KB, aggrb=494KB/s, minb=506KB/s, maxb=506KB/s, mint=30228msec, maxt=30228msec
>
> Disk stats (read/write):
> sdb: ios=4189/0, merge=0/0, ticks=96428/0, in_queue=478798, util=100.00%
>
>
>
> Summary
> =======
> Read latencies are a bit worse, but this overhead is only imposed when users
> ask for this feature by turning on CONFIG_BLKIOTRACK. We expect there to be =
> a something of a latency vs isolation tradeoff.

- What number you are looking at to say READ latencies are worse.
- Who got isolated here? If READS latencies are worse and you are saying
that's the cost of isolation, that means you are looking for isolation
for WRITES? This is the first time time I am hearing that READS starved
WRITES and I want better isolation for WRITES.

Also CONFIG_BLKIOTRACK=n is not the solution. This will most likely be
set and we need to figure out which makes sense.

To me WRITE isolation comes handy only if we want to create speed
difference between multiple WRITE streams. And that can not reliably be
done till we make writeback logic cgroup aware.

If we try to put WRITES in a separate group, most likely WRITES will end
up getting bigger share of disk then what they are getting by default and
I seriously doubt that who is looking for that. So far all the complaints
I have heard is that in presence of WRITES, my READ latencies suffer and
not vice a versa.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Guan Xuetao: "RE: [PATCH] ARM: Use asm-generic/sizes.h"
Previous message: KOSAKI Motohiro: "Re: [PATCH 3/5] oom: create oom autogroup"
In reply to: Justin TerAvest: "[PATCH v2 6/8] cfq: add per cgroup writeout done by flusher stat"
Next in thread: Justin TerAvest: "Re: [RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered writes."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]