Re: [BUG]Writeback Cgroup/Dirty Throttle: very small buffered write thoughput caused by writeback cgroup and dirty thottle

From: Miao Xie
Date: Fri May 13 2016 - 02:14:08 EST


on 2016/5/12 at 23:32, Tejun Heo wrote:
On Thu, May 12, 2016 at 09:11:33AM +0800, Miao Xie wrote:
My box has 48 cores and 188GB memory, but I set
vm.dirty_background_bytes = 268435456
vm.dirty_bytes = 536870912

if I set vm.dirty_background_bytes and vm.dirty_bytes to be a large number(vm.dirty_background_bytes = 3GB,
vm.dirty_bytes = 4GB), then fio thoughput would be more than 1500MB/s. and then if I reset them to the original
value(the above ones), the thoughout would be down to 500MB/s.

And according my debug, I found fio sleeped for 1ms every time we dirty a page(balance dirty pages) when
the thoughput was down to 4MB/s, it might be a bug of dirty throttle when we open write back cgroup, I think.

Heh, so, for cgroups, the absolute byte limits can't applied directly
and converted to percentage value before being applied. You're
specifying 0.27% for threshold. Unfortunately, the ratio is
translated into a percentage number and 0.27% becomes 0, so your
cgroups are always over limit and being throttled.

Can you please see whether the following patch fixes the issue?

Better than the kernel without patch. Now the benchmark could reach the device bandwidth after 5-8 seconds.
But at the beginning, it was still very slow, and its thoughput was only 4MB/s for ~4 seconds, then it
could go up in 1~3 seconds.

Thanks
Miao

Thanks.

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 999792d..a455a21 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -369,8 +369,9 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
struct dirty_throttle_control *gdtc = mdtc_gdtc(dtc);
unsigned long bytes = vm_dirty_bytes;
unsigned long bg_bytes = dirty_background_bytes;
- unsigned long ratio = vm_dirty_ratio;
- unsigned long bg_ratio = dirty_background_ratio;
+ /* convert ratios to per-PAGE_SIZE for higher precision */
+ unsigned long ratio = (vm_dirty_ratio * PAGE_SIZE) / 100;
+ unsigned long bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100;
unsigned long thresh;
unsigned long bg_thresh;
struct task_struct *tsk;
@@ -382,26 +383,28 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
/*
* The byte settings can't be applied directly to memcg
* domains. Convert them to ratios by scaling against
- * globally available memory.
+ * globally available memory. As the ratios are in
+ * per-PAGE_SIZE, they can be obtained by dividing bytes by
+ * pages.
*/
if (bytes)
- ratio = min(DIV_ROUND_UP(bytes, PAGE_SIZE) * 100 /
- global_avail, 100UL);
+ ratio = min(DIV_ROUND_UP(bytes, global_avail),
+ PAGE_SIZE);
if (bg_bytes)
- bg_ratio = min(DIV_ROUND_UP(bg_bytes, PAGE_SIZE) * 100 /
- global_avail, 100UL);
+ bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
+ PAGE_SIZE);
bytes = bg_bytes = 0;
}

if (bytes)
thresh = DIV_ROUND_UP(bytes, PAGE_SIZE);
else
- thresh = (ratio * available_memory) / 100;
+ thresh = (ratio * available_memory) / PAGE_SIZE;

if (bg_bytes)
bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE);
else
- bg_thresh = (bg_ratio * available_memory) / 100;
+ bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE;

if (bg_thresh >= thresh)
bg_thresh = thresh / 2;

.