Re: Interesting scheduling times

Richard Gooch (rgooch@atnf.csiro.au)
Tue, 22 Sep 1998 23:31:08 +1000


Larry McVoy writes:
> : You make it sound like I don't know what I'm measuring. This is
> : incorrect.
>
> Umm, err, if that is true, then you should have an answer as to why
> your tests have such wild variance. My claim that you don't know
> what you are measuring is based on your varying results being
> dismissed with hand waving. I'm not sure if you understand how most
> people go about benchmarking, but I believe that they all pretty
> much know what the answer is going to be ahead of time. When the
> results come back, it either confirms you knew what you were talking
> about or it tells you that you don't understand what is going on. I
> respectfully suggest that you fall into the latter camp and will
> stay there until you can explain your variance and prove your
> explanation.

No, I'm not dismissing the variance. I've suggested possible
mechanisms which can contribute to variance.
Now, of course I don't understand the sources of all the
variances. Neither do you (that's clear from the man page for
lat_ctx). What I'm doing is exploring some possible causes. But you
have made out that because my test gives variance that it is
flawed. This simply isn't true. It means that my test is sensitive to
various effects. This is useful for me as it allows me to probe the
system.

Note that I've also added code to compute the median and I still get
significant variability.

I've managed to isolate another cause of the variance: processes being
put on the run queue. I've added code to check for how many processes
are on the run queue just before and just after I do the benchmark
loop. I get anywhere from 2 to 6 processes! The number of processes on
the run queue actually changes over the course of the benchmark!

Some immediate candidates are my shell, xterm and X server. Add to
this my xclock, xmeter and xload which are all periodically placed on
the run queue.

The reason why your benchmark is not sensitive is because you run your
processes at normal priority whereas I run mine at RT priority. That
means for you any process woken up is likely to do it's work and come
off the run queue quickly. For me, the processes stay on the run queue
until the benchmark finishes (exits). Since I'm specifically
interested in RT context switching, I can't just go to normal process
priorities.

> I don't really care what you do, personally. But if you show up on
> this list with that benchmark and expect people to change the kernel
> based on results you can't explain, then I'm gonna speak up and
> point out the flaws.

I can't come with an exhaustive list of causes, because the systems
are too complex. But I've come up with a couple of reasonable
candidates.

> I'm not saying you are right or that you are wrong. I'm saying your
> results are essentially uninteresting until you can explain what
> they mean, all of them, not just the ones that you want to present.

I think I'm getting a good handle on the causes of the variance. It
takes time to track them down.

> : Using these values I can then look at where time can be
> : saved. Further, my tests have shown variability, which has led me to
> : investigate the cause of that variability. If the variability is due
> : to caching problems, I can look at how to minimise the effects.
>
> Come on, think about it. What's in the cache that could cause this
> much variance? You keep saying caching problems and I keep telling
> you that that can't be it; and I can prove it by demonstrating a
> benchmark that measures what you claim to measure and doesn't have
> the variance. Not only that, you can think about how much code and
> data is involved here and actually work out exactly what the number
> should be. Your mins are close but your averages and maxes are way
> out of line. And I can't think of an explanation that would include
> those out of line numbers. And you haven't either. So go back, do
> the homework and figure out what is going on when things vary that
> much. Something must be getting weird, what is it? Why don't other
> benchmarks see it?

Yes, something is weird. Doesn't mean my test code is flawed. It means
it's sensitive to different effects.

> It's a useful exercise, by the way. Every time I'm in your shoes
> (and I've been there a lot), I gain a great deal of insight by
> figuring out what is going on. Who knows, maybe you'll bump into
> some great discovery in the process. Stranger things have happened.

Well, the whole point of this exercise has been to get a handle on the
context switch times, and how it can be improved. Along the way my
test code is exposing some subtle effects that I think are worth
investigating.

> : I've been able
> : to reorder struct task_struct and reduce the context switch latency as
> : a result of my analysis (from 0.2 us per run queue process to 0.15
> : us).
>
> I believe that if you go back and read what I wrote in one of the first
> postings on this, I suggested that and suggested that you could get it
> down to one cache line miss.

I had already been considering this approach. I missed the "one cache
line" bit, though. By my reckoning, the minimum is two cache lines
(assuming 32 byte cache lines: is that right?).
At least, two cache lines without changing the hardcoded stuff: the
part where it say "don't touch".

> : Part of this is the interrupt latency. I've seen people looking at
> : this on the list, but I've not seen much attention being paid to how
> : long it takes to wake a process up and have it start running
> : (i.e. switch out whatever is running now and switch in my RT task).
>
> The reason that nobody is looking at it is because it isn't a significant
> problem. People are looking at interrupt latency because it needs to
> be faster.

Everything needs to be faster.

> : What matters is that when it has
> : to wake up, it does so as quickly as possible. One of the overheads
> : here is the length of the run queue. Having separate run queues will
> : mean you are one step closer to safely putting more "normal" load on a
> : machine without worrying about what it does to your RT latencies.
>
> You are worried about the wrong problem. You're dealing with a 3rd or
> wth or 5th order term while ignoring the first order terms.

No, as I said, others are looking at other aspects. I've decided to
look at scheduling overheads.

> I have a question for you. What do you thin would happen if you
> took that benchmark you have and actually touched some data between
> each sched_yield()?

The times would go up, of course. Not all processes touch a lot of
data, though.

> I have another question for you. How about adding a RT run queue
> and run your system (not your benchmark) on the modified system. If
> what you think is true, you ought to be able to measure a throughput
> difference, right? So let's get some numbers from a real
> application, not a flawed benchmark.

It's latency I care about. Throughput is secondary.

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/