Re: dynamic-hz

From: Mitchell Blank Jr
Date: Thu Dec 16 2004 - 08:59:05 EST


Gabriel Paubert wrote:
> So the options are:
> 1) microseconds, allows up to roughly half an hour (signed)
> or an hour (unsigned).
> 2) nanoseconds, needs 64 bits, nice for 64 bit machines but
> at the risk of bloat on 32 bit ones.
> 3) timespecs, somewhat wasteful on 64 bit machines (two longs).

Also forgive me if this has already been discussed (I might have missed
some of the messages on these threads) but there's also Paul Henning
Kamp's "bintime" format used in FreeBSD:
http://phk.freebsd.dk/pubs/timecounter.pdf

I'm not convinced it's the right solution for this problem but the paper
does make a lot of good points.

I also agree that any new timer API needs to have entry points for users
that can handle an imprecise wakeup -- running multiple wakeups at once
is important at high load.

Another idea I've been toying around in my head related to this (but
would need some instrumentation to prove) I bet on heavy server loads
there's a lot of timeouts where:
1. Requested timeout is always >N seconds
2. Timeouts almost always get canceled well before (N/2) seconds have
passed

If this is the case you could make a pretty simple hack -- on each CPU keep
two list_head's of timers -- lets call them "add_list" and "prev_list".
Now every N/2 seconds do:
tmp = prev_list
prev_list = add_list
add_list = EMPTY
...and then add all the timers on "tmp" to the normal timer queue.

The advantage here is that to add a timer you just have to insert it onto
add_list -- you don't have to keep these lists in order. By the time
we get around to adding the timers on "tmp" to the main timer queue we know:
1. They are still waiting to expire (at most "N" seconds have elapsed
since they were inserted)
2. Most of them have been canceled (since at least "N/2" seconds have
passed) and were thus removed from the list. "tmp" should not
have many elements remaining

I think for some class of timeouts (device timeouts, network, etc) this
should be pretty efficient. You'd have to do a bunch of instrumenting
to see if there are enough timers with these characteristics to make this
useful (and what would make a good value for 'N') I've got a zillion other
projects so I'm not going to have a chance to do this, but maybe it'll
give someone else some ideas.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/