Re: Interesting scheduling times - NOT

Kurt Garloff (garloff@kg1.ping.de)
Tue, 22 Sep 1998 13:12:19 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Richard Gooch: "Re: Interesting scheduling times - NOT"
Previous message: Richard Gooch: "Re: Today Linus redesigns the networking driver interface (was Re: tulip driver in ...)"
In reply to: David Lang: "Re: Linux Community & UDI is in the news"

--Pd0ReVV5GZGQvF3a
Content-Type: text/plain; charset=us-ascii

On Fri, Sep 18, 1998 at 10:51:16AM -0600, Larry McVoy wrote:
> ..
> another 2.4. To get to your 28 usecs, there would have to be 11 cache
> misses (in both L1 and L2) per link walked. If that's the case, my
> apologies and I'll go yell at Linus. But I don't think that's the case.
> I just walked the code to make sure and it looks like 6-7 cache misses
> to me: the fields referenced are (in order of reference):
>
> has_cpu // referenced in can_schedule()
> policy // referenced in goodness()
> counter // referenced in goodness()
> processor // referenced in goodness()
> mm // referenced in goodness()
> priority // referenced in goodness()
> next_run // referenced in schedule()
>
> processor is right next to has_cpu so if they don't cross a boundry,
> that's one cache line.
>
> ...
>
> An interesting test would be to go wack sched.h to put the listed fields
> all right next to each other, starting on a cache line boundry, in the
> order listed above. It's 20 bytes total, that is just 2 cache lines
> instead 6-7. If that made a big difference, you'd have a case to make that
> the run queue hurts. Even if it doesn't make a big difference, it would
> be a good thing - cache misses are bad and they get nastier on SMP. IRIX
> has been heavily wacked to put all the data you need and the lock on the
> same cache line for all the data structures that are commonly used. They
> didn't do all that work because they were bored.

Good idea, Larry. So I also spent some time without being bored ...

Here is my own statistics:
These fields are referenced in sched () for the previous task:
* processor
* need_resched
* counter
* policy
* priority
* state
* sigpending (signal_pending)
* timeout
* has_cpu (SMP)

These fields are referenced for each runnable task:
* has_cpu (SMP) (can_schedule)
* next_run
and in goodness()
* policy
* rt_priority
* counter
* processor (SMP)
* mm
* prioritiy
if we recalc counters:
* next_task (for_each_task)

If the runq is changed this is referenced:
* prev_run

So I reordered the fields in sched.h: task_struct to be more cache friendly.
All fields referenced within the runnable loop are within the second cache
line (0x20--0x3f), whereas the other fields normally used in sched() are
placed within the first cache line (0x00-0x1f). Given that the struct is
properly aligned (i.e. to 2^5=0x20), we have a minimal cache footprint.
(This is true for 32bit machines, only, because the sizes of long and of
pointers are different on 64bit machines.)

I did one change, some may dislike: I changed the type of the SMP fields
from int to char, thus preventing machines with more than 128 CPUs to be
happy. I don't know if this is an issue. Problem is I would not have been
able to fit everything important into 0x40 bytes, otherwise.
It might be that lock_depth is the most critical and needs to be changed
back to int? I'm no SMP expert. Only has_cpu and processor are critical to
fit into the cache line.

I didn't perform timings on the code, but I did make sure, Linux (UP) works
properly after applying the patch.
Richard, could you perform your tests? You can also send me your test
program. As the K6-2 L1 cache is only 2 way associative, IIRC, it should
make some difference here, too.
Linus, you may want to include it into the kernel, if it helps making
sched() a little bit faster?

Appended the diff against 2.1.122.

-- 
Kurt Garloff, Dortmund 
<K.Garloff@ping.de>
PGP key on http://student.physik.uni-dortmund.de/homepages/garloff
----------------------------------------------------------------------------
The following is a Python RSA implementation. According to the US Government
posting these four lines makes me an international arms trafficker!  Join me
in civil disobedience; add these lines of code to your .sig block to help get
this stupid and unconstitutional law changed.
============================================================================
from sys import*;from string import*;a=argv;[s,p,q]=filter(lambda x:x[:1]!=
'-',a);d='-d'in a;e,n=atol(p,16),atol(q,16);l=(len(q)+1)/2;o,inb=l-d,l-1+d
while s:s=stdin.read(inb);s and map(stdout.write,map(lambda i,b=pow(reduce(
lambda x,y:(x<<8L)+y,map(ord,s)),e,n):chr(b>>8*i&255),range(o-1,-1,-1)))

--Pd0ReVV5GZGQvF3a
Content-Type: text/plain; charset=us-ascii
Content-Description: 21122-task_struct.diff
Content-Disposition: attachment; filename="21122-task_struct.diff"

--- linux/include/linux/sched.h.orig	Thu Sep 17 10:33:04 1998
+++ linux/include/linux/sched.h	Tue Sep 22 13:10:26 1998
@@ -207,7 +207,9 @@
 struct user_struct;
 
 struct task_struct {
-/* these are hardcoded - don't touch */
+/* First put all info important needed in schedule() for every task
+ * (according to L.McVoy) in order to be cache friendly.
+ * Offset comments are for true 32bit archs, only. K.Garloff, 98/09/22 */
 	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
 	unsigned long flags;	/* per process flags, defined below */
 	int sigpending;
@@ -215,19 +217,28 @@
 					 	0-0xBFFFFFFF for user-thead
 						0-0xFFFFFFFF for kernel-thread
 					 */
+/* 0x10: */
 	struct exec_domain *exec_domain;
 	long need_resched;
+	unsigned long timeout;
+	struct task_struct *prev_run;
 
+/* 0x20: */
+	struct task_struct *next_run;
+	struct task_struct *next_task;
+	unsigned long policy, rt_priority;
+/* 0x30: */
 /* various fields */
-	long counter;
-	long priority;
+   	long counter, priority;
+/* memory management info */
+	struct mm_struct *mm;
 /* SMP and runqueue state */
-	int has_cpu;
-	int processor;
-	int last_processor;
-	int lock_depth;		/* Lock depth. We can context switch in and out of holding a syscall kernel lock... */	
-	struct task_struct *next_task, *prev_task;
-	struct task_struct *next_run,  *prev_run;
+	char processor, has_cpu, last_processor, lock_depth;
+	/* Lock depth. We can context switch in and out of holding a syscall kernel lock... */	
+
+/* 0x40: */
+
+	struct task_struct *prev_task;
 
 /* task state */
 	struct linux_binfmt *binfmt;
@@ -258,7 +269,6 @@
 	struct task_struct **tarray_ptr;
 
 	struct wait_queue *wait_chldexit;	/* for wait4() */
-	unsigned long timeout, policy, rt_priority;
 	unsigned long it_real_value, it_prof_value, it_virt_value;
 	unsigned long it_real_incr, it_prof_incr, it_virt_incr;
 	struct timer_list real_timer;
@@ -295,8 +305,6 @@
 	struct fs_struct *fs;
 /* open file information */
 	struct files_struct *files;
-/* memory management info */
-	struct mm_struct *mm;
 /* signal handlers */
 	spinlock_t sigmask_lock;	/* Protects signal and blocked */
 	struct signal_struct *sig;
@@ -338,9 +346,12 @@
  */
 #define INIT_TASK \
 /* state etc */	{ 0,0,0,KERNEL_DS,&default_exec_domain,0, \
+/* timeout */	0,&init_task,&init_task,&init_task,\
+/* policy */	SCHED_OTHER,0, \
 /* counter */	DEF_PRIORITY,DEF_PRIORITY, \
-/* SMP */	0,0,0,-1, \
-/* schedlink */	&init_task,&init_task, &init_task, &init_task, \
+/* mm */	&init_mm, \
+/* SMP */	(char)0,(char)0,(char)0,(char)-1, \
+/* schedlink */	&init_task, \
 /* binfmt */	NULL, \
 /* ec,brk... */	0,0,0,0,0,0, \
 /* pid etc.. */	0,0,0,0,0, \
@@ -348,7 +359,7 @@
 /* pidhash */	NULL, NULL, \
 /* tarray */	&task[0], \
 /* chld wait */	NULL, \
-/* timeout */	0,SCHED_OTHER,0,0,0,0,0,0,0, \
+/* it */	0,0,0,0,0,0, \
 /* timer */	{ NULL, NULL, 0, 0, it_real_fn }, \
 /* utime */	{0,0,0,0},0, \
 /* per CPU times */ {0, }, {0, }, \
@@ -367,7 +378,6 @@
 /* tss */	INIT_TSS, \
 /* fs */	&init_fs, \
 /* files */	&init_files, \
-/* mm */	&init_mm, \
 /* signals */	SPIN_LOCK_UNLOCKED, &init_signals, {{0}}, {{0}}, NULL, &init_task.sigqueue, 0, 0, \
 }
 

--Pd0ReVV5GZGQvF3a--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Richard Gooch: "Re: Interesting scheduling times - NOT"
Previous message: Richard Gooch: "Re: Today Linus redesigns the networking driver interface (was Re: tulip driver in ...)"
In reply to: David Lang: "Re: Linux Community & UDI is in the news"