Re: Interesting scheduling times - NOT

Larry McVoy (lm@bitmover.com)
Tue, 22 Sep 1998 23:03:10 -0600


Richard Gooch <rgooch@atnf.csiro.au>:
: There is a *reason* the variance is there. The question is what is
: causing it.

Yup. I've been asking you to explain that for many messages now.
I've also tried to say, over, and orver, and over that it is not normal
for there to be that kind of variance, that that has to be a bug in your
benchmark - every other context switch benchmark I've ever seen doesn't
behave like yours, they are very stable. For every theory you've dreamed
up, I've demonstrated that it is incorrect. At what point do you just want
to stop arguing and start debugging your benchmark?

: Maybe I have a "complex" because your first (private) message to me
: started off with "What the fuck are you talking about?". And you've
: been pretty agressive about the whole thing and have used emotive
: words.

If I've hurt your feelings, my apologies (as I said immediately in
private mail as well, nice of you to leave that out).

: Even if the variances in my measurement turn out to be due to a broken
: test or sensitivity to some "uninteresting" effect, there is no
: denying that an increased run queue length slows down context switch
: times. With my test I get a cost of 0.2 us per process and with yours
: I get 0.19 us (PPro 180).

I couldn't agree more. So maybe rather than trying to fix an operating
system that isn't broken, you should focus on getting rid of all those
extra processes. A lot of my frustration with this argument is that
I've been through this with customer after customer at Sun and SGI.
They all want you to change the operating system to suit their application
when what they should do is write their application correctly in the
first place. Too many processes are going to screw you up no matter
what queue they are on and no matter what scheduler you use.

This discussion reminds me of the C++ people that keep asking for bigger
hardware caches so their code will run fast. They are constantly amazed
when you show the less lines of C code, that solves the problem they
are trying to solve, using less cache and running faster. You tell us -
what's the right thing to do - fix the hardware or fix the C++?

: If you're not interested in being constructive (i.e. "I think your
: variance may be due to XYZ effect"), then why not get off my back and
: let me get on with tracking down the problem?
: If I propose a mechanism which I think may explain the variance, and
: you don't agree, all you need to do is say so and why. Repeating the
: mantra "your test is broken" is pointless.

You keep asking me to debug it. Please remember that I'm not at some
cushy job, I'm spending my savings working tools for Linux kernel
developers. It's not like I have tons of idle cycles to waste on
your code. I've already put out the effort, for free, to make an
as portable and accurate benchmark as I possibly can - a benchmark,
by the way, that measures a lot more than just toy context switches.
It's been a tremendous amount of work over quite a few years for which
I've not received a dime. It's bad enough making sure that my code
is not full of bugs - explain to me what possible reason I could have
for debugging your code. I think it's enough that I've proved that a
properly written test won't show that variance.

I think that the onus is on you to debug your own code. And until you
can show that you have it debugged, maybe it would be better if you
deferred to the code which has been show to work.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/