Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp uptime

From: Peter Zijlstra
Date: Tue May 24 2011 - 08:21:18 EST

Next message: David Fries: "Re: mmotm 2011-05-12-15-52 uploaded"
Previous message: Pengfei Zhang: "[PATCH] MIPS:i8259.c remove resume and shutdown to syscore_ops"
Next in thread: Peter Zijlstra: "Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp uptime"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Sorry for the delay, life got interesting and then it slipped my mind.

On Mon, 2011-04-18 at 16:59 +0200, Jan Kara wrote:
> Your formula is:
> p(j)=\sum_i x_i(j)/(t_i*2^{i+1})
> where $i$ sums from 0 to \infty, x_i(j) is the number of events of type
> $j$ in period $i$, $t_i$ is the total number of events in period $i$.

Actually:

p_j = \Sum_{i=0} (d/dt_i) * x_j / 2^(i+1)

[ discrete differential ]

Where x_j is the total number of events for the j-th element of the set
and t_i is the i-th last period.

Also, the 1/2^(i+1) factor ensures recent history counts heavier while
still maintaining a normalized distribution.

Furthermore, by measuring time in the same measure as the events we get:

t = \Sum_i x_i

which yields that:

p_j = x_j * {\Sum_i (d/dt_i)} * {\Sum 2^(-i-1)}
= x_j * (1/t) * 1

Thus

\Sum_j p_j = \Sum_j x_j / (\Sum_i x_i) = 1

> I want to compute
> l(j)=\sum_i x_i(j)/2^{i+1}
> g=\sum_i t_i/2^{i+1}
> and
> p(j)=l(j)/g

Which gives me:

p_j = x_j * \Sum_i 1/t_i
= x_j / t

Again, if we then measure t in the same events as x, such that:

t = \Sum_i x_i

we again get:

\Sum_j p_j = \Sum_j x_j / \Sum_i x_i = 1

However, if you start measuring t differently that breaks, and the
result is no longer normalized and thus not suitable as a proportion.

Furthermore, while x_j/t is an average, it does not have decaying
history, resulting in past behaviour always affecting current results.
The decaying history thing will ensure that past behaviour will slowly
be 'forgotten' so that when the media is used differently (seeky to
non-seeky workload transition) the slow writeout speed will be forgotten
and we'll end up at the high writeout speed corresponding to less seeks.
Your average will end up hovering in the middle of the slow and fast
modes.

> Clearly, all these values can be computed in O(1).

True, but you get to keep x and t counts over all history, which could
lead to overflow scenarios (although switching to u64 should mitigate
that problem in our lifetime).

> Now for t_i = t for every
> i, the results of both formulas are the same (which is what made me make my
> mistake).

I'm not actually seeing how the averages will be the same, as explained,
yours seems to never forget history.

> But when t_i differ, the results are different.

>From what I can tell, when you stop measuring t in the same events as x
everything comes down because then the sum of proportions isn't
normalized.

> I'd say that the
> new formula also provides a meaningful notion of writeback share although
> it's hard to quantify how far the computations will be in practice...

s/far/fair/ ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: David Fries: "Re: mmotm 2011-05-12-15-52 uploaded"
Previous message: Pengfei Zhang: "[PATCH] MIPS:i8259.c remove resume and shutdown to syscore_ops"
Next in thread: Peter Zijlstra: "Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp uptime"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]