Re: [PATCH 03/10] writeback: Do not congestion sleep if there areno congested BDIs or significant writeback

From: Mel Gorman
Date: Mon Sep 13 2010 - 06:30:30 EST


On Mon, Sep 13, 2010 at 07:20:37PM +0900, Minchan Kim wrote:
> On Mon, Sep 13, 2010 at 7:07 PM, Mel Gorman <mel@xxxxxxxxx> wrote:
> > On Mon, Sep 13, 2010 at 06:48:10PM +0900, Minchan Kim wrote:
> >> >> > > > <SNIP>
> >> >> > > > I'm not saying it is. The objective is to identify a situation where
> >> >> > > > sleeping until the next write or congestion clears is pointless. We have
> >> >> > > > already identified that we are not congested so the question is "are we
> >> >> > > > writing a lot at the moment?". The assumption is that if there is a lot
> >> >> > > > of writing going on, we might as well sleep until one completes rather
> >> >> > > > than reclaiming more.
> >> >> > > >
> >> >> > > > This is the first effort at identifying pointless sleeps. Better ones
> >> >> > > > might be identified in the future but that shouldn't stop us making a
> >> >> > > > semi-sensible decision now.
> >> >> > >
> >> >> > > nr_bdi_congested is no problem since we have used it for a long time.
> >> >> > > But you added new rule about writeback.
> >> >> > >
> >> >> >
> >> >> > Yes, I'm trying to add a new rule about throttling in the page allocator
> >> >> > and from vmscan. As you can see from the results in the leader, we are
> >> >> > currently sleeping more than we need to.
> >> >>
> >> >> I can see the about avoiding congestion_wait but can't find about
> >> >> (writeback < incative / 2) hueristic result.
> >> >>
> >> >
> >> > See the leader and each of the report sections entitled
> >> > "FTrace Reclaim Statistics: congestion_wait". It provides a measure of
> >> > how sleep times are affected.
> >> >
> >> > "congest waited" are waits due to calling congestion_wait. "conditional waited"
> >> > are those related to wait_iff_congested(). As you will see from the reports,
> >> > sleep times are reduced overall while callers of wait_iff_congested() still
> >> > go to sleep. The reports entitled "FTrace Reclaim Statistics: vmscan" show
> >> > how reclaim is behaving and indicators so far are that reclaim is not hurt
> >> > by introducing wait_iff_congested().
> >>
> >> I saw  the result.
> >> It was a result about effectiveness _both_ nr_bdi_congested and
> >> (writeback < inactive/2).
> >> What I mean is just effectiveness (writeback < inactive/2) _alone_.
> >
> > I didn't measured it because such a change means that wait_iff_congested()
> > ignored BDI congestion. If we were reclaiming on a NUMA machine for example,
> > it could mean that a BDI gets flooded with requests if we only checked the
> > ratios of one zone if little writeback was happening in that zone at the
> > time. It did not seem like a good idea to ignore congestion.
>
> You seem to misunderstand my word.
> Sorry for not clear sentence.
>
> I don't mean ignore congestion.
> First of all, we should consider congestion of bdi.
> My meant is whether we need adding up (nr_writeback < nr_inacive /2)
> heuristic plus congestion bdi.

Early tests indicated "yes".

> It wasn't previous version in your patch but it showed up in this version.
> So I thought apparently you have any evidence why we should add such heuristic.
>

Only the feedback from the first patch where Johannes posted a workload that
did exhibit a problem. Isolated tests on just that workload led to the
(nr_writeback < inactive / 2) change.

> >
> >> If we remove (writeback < inactive / 2) check and unconditionally
> >> return, how does the behavior changed?
> >>
> >
> > Based on just the workload Johannes sent, scanning and completion times both
> > increased without any improvement in the scanning/reclaim ratio (a bad result)
> > hence why this logic was introduced to back off where there is some
> > writeback taking place even if the BDI is not congested.
>
> Yes. That's what I want. At least, comment of function should have it
> to understand the logic. In addition, It would be better to add the
> number to show how it back off well.
>

Very well. I'll hold off posting v2 of the series now then because producing
such results take many hours and my machines are currently busy.
Hopefully I'll have something by Wednesday.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/