Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO
From: Andrea Righi
Date: Wed Apr 22 2009 - 06:22:57 EST
On Wed, Apr 22, 2009 at 10:21:53AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 22 Apr 2009 09:33:49 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>
>
> > > And this should be probably strictly connected to the IO controller. If
> > > we throttle or delay the dispatching/submission of some IO requests
> > > without throttling the dirty pages rate a cgroup could completely waste
> > > its own available memory with dirty (hard and slow to reclaim) pages.
> > >
> > > That is in part the approach I used in io-throttle v12, adding a hook in
> > > balance_dirty_pages_ratelimited_nr() to throttle the current task when
> > > cgroup's IO limit are exceeded. Argh!
> > >
> > > So, another proposal could be to re-add in io-throttle v14 the old hook
> > > also in balance_dirty_pages_ratelimited_nr().
> > >
> > > In this way io-throttle would:
> > >
> > > - use page_cgroup infrastructure and page_cgroup->flags to encode the
> > > cgroup id that firstly dirtied a generic page
> > > - account and opportunely throttle sync and writeback IO requests in
> > > submit_bio()
> > > - at the same time throttle the tasks in
> > > balance_dirty_pages_ratelimited_nr() if the cgroup they belong has
> > > exhausted the IO BW (or quota, share, etc. in case of proportional BW
> > > limit)
> > >
> >
> > IMHO, io-controller should just work as I/O subsystem as bdi. Now, per-bdi dirty_ratio
> > is suppoted and it seems to work well.
> >
> > Can't we write a function like bdi_writeout_fraction() ?;
> > It will be a simple choice.
> >
> One more thing, if you want dirty_ratio for throttoling I/O not for supporing page reclaim,
> Something like task_dirty_limit() will be apporpriate.
>
> Thanks,
> -Kame
Actually I was proposing something quite similar, if I've understood
well. Just add a hook in balance_dirty_pages() to throttle tasks in
cgroups that exhausted their IO BW.
The way to do so will be similar to the per-bdi write throttling, taking
in account the IO requests previously submitted per cgroup, the pages
dirtied per cgroup (considering that are not necessarily dirtied by the
owner of the page) and apply something like congestion_wait() to
throttle the tasks in the cgroups that exceeded the BW limit.
Maybe we can just introduce cgroup_dirty_limit() simply replicating what
we're doing for task_dirty_limit(), but using per cgroup statistics of
course.
I can change the io-throttle controller to do so. This feature should be
valid also for the proportional BW approach.
BTW Vivek's proposal to also dispatch IO requests according to cgroup
proportional BW limits can be still valid and it is worth to be tested
IMHO. But we must also find a way to say to the right cgroup: hey! stop
to waste the memory with dirty pages, because you've directly or
indirectly generated too much IO in the system and I'm throttling and/or
not scheduling your IO requests.
Objections?
-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/