Re: time slice cfq comments

From: Ingo Molnar
Date: Sat Dec 11 2004 - 04:17:22 EST



* Con Kolivas <kernel@xxxxxxxxxxx> wrote:

> Hi Jens
>
> Just thought I'd make a few comments about some of the code in your
> time sliced cfq.

(this code was actually a quick hack from me.)

> + if (p->array)
> + return min(cpu_curr(task_cpu(p))->time_slice,
> + (unsigned int)MAX_SLEEP_AVG);
>
> MAX_SLEEP_AVG is basically 10 * the average time_slice so this will
> always return task_cpu(p)->time_slice as the min value (except for the
> race you described in your comments). What you probably want is

the min() is there to not get ridiculous results due to the runqueue
race, nothing else. Basically i didnt want to lock the runqueue to do
something that is an estimation anyway, and rq->curr might be invalid.
This was a proof-of-concept thing i wrote for Jens, if it works out then
i think we want to lock the runqueue nevertheless, to not dereference
possibly deallocated tasks (and to not trip up things like
DEBUG_PAGEALLOC).

> Further down you do:
> + /*
> + * for blocked tasks, return half of the average sleep time.
> + * (because this is the average sleep-time we'll see if we
> + * sample the period randomly.)
> + */
> + return NS_TO_JIFFIES(p->sleep_avg) / 2;
>
> unfortunately p->sleep_avg is a non-linear value (weighted upwards
> towards MAX_SLEEP_AVG). I suspect here you want
>
> + return NS_TO_JIFFIES(p->sleep_avg) / MAX_BONUS;

sleep_avg might be nonlinear, but nevertheless it's an estimation of the
sleep time of a task. It's different if the task is interactive. We
cannot know how much the task really will sleep, what we want is a good
guess. I didnt want to complicate things too much, as long as the
ballpark figure is right. (i.e. as long as the function returns '0' for
on-runqueue tasks, returns a large value for long sleepers and returns
something inbetween for short/medium sleepers.) We can later on
complicate it with things like looking at p->timestamp to figure out how
long it has been sleeping (and thus the ->sleep_avg is perhaps not
authorative anymore), but i kept it simple & stupid for now.

> I don't see any need for / 2.

the need for /2 is this: ->sleep_avg tells us the average _full_ sleep
period time (roughly). The CFQ IO-scheduler is sampling the task
_sometime_ during that period, randomly. So on average the task will
sleep another /2 of the sleep-average. Ok?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/