Re: Scheduler fairness problem on 2.6 series

From: Con Kolivas
Date: Wed Aug 11 2004 - 06:28:37 EST


Prakash K. Cheemplavam wrote:
Con Kolivas wrote:
| I tried this on the latest staircase patch (7.I) and am not getting any
| output from your script when tested up to 60 threads on my hardware. Can
| you try this version of staircase please?
|
| There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
|
| http://ck.kolivas.org/patches/2.6/2.6.8/

Hi,

I just updated to 2.6.8-rc4-ck2 and tried the two options interactive
and compute. Is the compute stuff functional? I tried setting it to 1
within X and after that X wasn't usable anymore (meaning it looked like
locked up, frozen/gone mouse cursor even). I managed to switch back to
console and set it to 0 and all was OK again.

Compute is very functional. However it isn't remotely meant to be run on a desktop because of very large scheduling latencies (on purpose).

The interactive to 0 setting helped me with runnign locally multiple
processes using mpi. Nevertheless (only with interactive 1 regression to
vanilla scheduler, else same) can't this be enhanced?

I don't understand your question. Can what be enhanced?

Details: I am working on a load balancing class using mpi. For testing
purpises I am running multiple processes on my machine. So for a given
problem I can say, it needs x time to solve. Using more processes opn a
single machine, this time (except communication and balancing overhead)
shouldn't be much larger. Unfortunately this happens. Eg. a given
probelm using two processes needs about 20 seconds to finish. But using
8 it already needs 47s (55s with interactiv set to 1). No, my balancing
framework is quite good. On a real (small, even larger till 128 nodes
tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite
linearly.

Once again I dont quite understand you. Are you saying that there is more than 50% cpu overhead when running 8 processes? Or that the cpu is distributed unfairly such that the longest will run for 47s?

Any idea how to tweak the staircase to get near the 20 seconds with more
processes? Or is this rather a problem of mpich used locally?

Compute mode is by far the most scalable mode in staircase for purely computational tasks. The cost is that of interactivity; it is bad on purpose since it is a no-compromise maximum cpu cache utilisation policy.

If you like I can send you my code to test (beware it is not that small).

Cheers,

Prakash

Cheers,
Con

Attachment: signature.asc
Description: OpenPGP digital signature