Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status

From: Gui Jianfeng
Date: Fri Jun 11 2010 - 02:33:34 EST

Divyesh Shah wrote:
> On Tue, May 25, 2010 at 6:25 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>> On Tue, May 25, 2010 at 11:00:54AM +0800, Gui Jianfeng wrote:
>>> Vivek Goyal wrote:
>>>> On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
>>>>> Vivek Goyal wrote:
>>>>>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
>>>>>>> Vivek Goyal wrote:
>>>>>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>>>>>>>>> Hi,
>>>>>>>>> This series implements three new interfaces to keep track of tranferred bytes,
>>>>>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
>>>>>>>>> from service tree, these three interfaces will reset and shows zero.
>>>>>>>> Hi Gui,
>>>>>>>> Can you give some details regarding how this functionality is useful? Why
>>>>>>>> would somebody be interested in only in stats of till group was
>>>>>>>> backlogged and not in total stats?
>>>>>>>> Groups can come and go so fast and these stats will reset so many times
>>>>>>>> that I am not able to visualize how these stats will be useful.
>>>>>>> Hi Vivek,
>>>>>>> Currently, we assign weight to a group, but user still doesn't know how fast the
>>>>>>> group runs. With io rate interface, users can check the rate of a group at any
>>>>>>> moment, or to determine whether the weight assigned to a group is enough.
>>>>>>> bytes and time interface is just for debug purpose.
>>>>>> Gui,
>>>>>> I still don't understand that why blkio.sectors or blkio.io_service_bytes
>>>>>> or blkio.io_serviced interfaces are not good enough to determine at what
>>>>>> rate a group is doing IO.
>>>>>> I think we can very well write something in userspace like "iostat" to
>>>>>> display the per group rate. Utility can read the any of the above files
>>>>>> say at the interfval of 1s, calculate the diff between the values and
>>>>>> display that as group effective rate.
>>>>> Hi Vivek,
>>>>> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
>>>>> value. This value represents the actual rate a group runs. IMO, io rate calculated from
>>>>> user space is not accurate in following two scenarios:
>>>>> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
>>>>> rate calculated in this interval doesn't make sense.
>>>> If you are not servicing groups for long time, anyway it is very bad for
>>>> latency. So that's why soft limit of 300ms of CFQ makes sense and
>>>> practically I am not sure you will be blocking groups for .5s.
>>>> Even if you do, then user just needs to choose a bigger interval and you
>>>> will see more smooth rates. Reduce the interval and you might see little
>>>> bursty rate.
>>> Vivek,
>>> IIUC, the most big problem for user app is the user app doesn't know how long
>>> the group has been dequeued during the interval. For example, user choose
>>> 10s interval, 8s of which is not backlogged, but when user app calculates
>>> io rate, this 8s still include. So this rate isn't what we want. Am i missing
>>> something?
>> Gui,
>> If user application is not doing enough IO and group is getting deleted
>> fast, io_active_rate is not going to give you any meaningful data as it
>> will be lost the moment group gets deleted.
>> Hence one needs to monitor the IO rate when a workload is running and is
>> keeping disk busy more or less all the time.
>> Even in your example, if you monitored IO rate over 10 second interval and
>> group is not doing any IO, you just can't do anything about it. Just that
>> your measurement e method is wrong. Even io_active_rate will not help you
>> here as by the time you read the file, group is gone and there is no data.
>> The very reason you want to monitor rate is that you want to make sure
>> group is getting enough BW. If group is not doing IO then one can look at
>> blkio.dequeue file and see if group is getting deleted too frequently. If
>> yes, that means group is not doing enough IO to keep the disk busy. One
>> can also try increasing the weight of the group but that will not help
>> much if group does not remain backlogged for significant amount of time.
>>> "io_active_rate" will never take un-backlogged time into account when calculating
>>> io rate.
>> Theoritically blkio.sectors/blkio.time gives the rate excluding the time
>> when group was not backlogged?
> I agree with Vivek here. We use blkio.time as a source for io rate
> count for each cgroup, knowing that it is not entirely accurate but a
> good enough approximation.
> Gui, if you want to find out whether the cgroup has enough weight or
> not, I'd recommend looking at the wait_time stat in addition to
> blkio.time. It has been very useful in identifying jobs that are not
> getting enough IO done due to less weight assigned to them.

Ok, see. :)


>> But I will not recommend using blkio.time as it is very approximate.
>> I really am not able to see what this interface is really buying you.
>> Thanks
>> Vivek
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at
>> Please read the FAQ at
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at