Re: Interesting scheduling times - NOT

Richard Gooch (rgooch@atnf.csiro.au)
Tue, 22 Sep 1998 21:19:08 +1000


Kurt Garloff writes:
>
> --Pd0ReVV5GZGQvF3a
> Content-Type: text/plain; charset=us-ascii
>
> On Fri, Sep 18, 1998 at 10:51:16AM -0600, Larry McVoy wrote:
> > ..
> > another 2.4. To get to your 28 usecs, there would have to be 11 cache
> > misses (in both L1 and L2) per link walked. If that's the case, my
> > apologies and I'll go yell at Linus. But I don't think that's the case.
> > I just walked the code to make sure and it looks like 6-7 cache misses
> > to me: the fields referenced are (in order of reference):
> >
> > has_cpu // referenced in can_schedule()
> > policy // referenced in goodness()
> > counter // referenced in goodness()
> > processor // referenced in goodness()
> > mm // referenced in goodness()
> > priority // referenced in goodness()
> > next_run // referenced in schedule()
> >
> > processor is right next to has_cpu so if they don't cross a boundry,
> > that's one cache line.
> >
> > ...
> >
> > An interesting test would be to go wack sched.h to put the listed fields
> > all right next to each other, starting on a cache line boundry, in the
> > order listed above. It's 20 bytes total, that is just 2 cache lines
> > instead 6-7. If that made a big difference, you'd have a case to make that
> > the run queue hurts. Even if it doesn't make a big difference, it would
> > be a good thing - cache misses are bad and they get nastier on SMP. IRIX
> > has been heavily wacked to put all the data you need and the lock on the
> > same cache line for all the data structures that are commonly used. They
> > didn't do all that work because they were bored.
>
> Good idea, Larry. So I also spent some time without being bored ...
>
> Here is my own statistics:
> These fields are referenced in sched () for the previous task:
> * processor
> * need_resched
> * counter
> * policy
> * priority
> * state
> * sigpending (signal_pending)
> * timeout
> * has_cpu (SMP)
>
> These fields are referenced for each runnable task:
> * has_cpu (SMP) (can_schedule)
> * next_run
> and in goodness()
> * policy
> * rt_priority
> * counter
> * processor (SMP)
> * mm
> * prioritiy
> if we recalc counters:
> * next_task (for_each_task)
>
> If the runq is changed this is referenced:
> * prev_run
>
> So I reordered the fields in sched.h: task_struct to be more cache friendly.
> All fields referenced within the runnable loop are within the second cache
> line (0x20--0x3f), whereas the other fields normally used in sched() are
> placed within the first cache line (0x00-0x1f). Given that the struct is
> properly aligned (i.e. to 2^5=0x20), we have a minimal cache footprint.
> (This is true for 32bit machines, only, because the sizes of long and of
> pointers are different on 64bit machines.)
>
> I did one change, some may dislike: I changed the type of the SMP fields
> from int to char, thus preventing machines with more than 128 CPUs to be
> happy. I don't know if this is an issue. Problem is I would not have been
> able to fit everything important into 0x40 bytes, otherwise.
> It might be that lock_depth is the most critical and needs to be changed
> back to int? I'm no SMP expert. Only has_cpu and processor are critical to
> fit into the cache line.
>
> I didn't perform timings on the code, but I did make sure, Linux (UP) works
> properly after applying the patch.
> Richard, could you perform your tests? You can also send me your test
> program. As the K6-2 L1 cache is only 2 way associative, IIRC, it should
> make some difference here, too.
> Linus, you may want to include it into the kernel, if it helps making
> sched() a little bit faster?

I've already done this myself (my patch does similar things to yours)
and timed it on my PPro 180. In a previous message I posted the patch
too. See:
http://www.atnf.csiro.au/~rgooch/benchmarks/

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/