lowish-lat for 2.4.0-test10-pre3

From: Andrew Morton (andrewm@uow.edu.au)
Date: Sat Oct 21 2000 - 08:58:00 EST


Patch is at

http://www.uow.edu.au/~andrewm/linux/2.4.0-test10-pre3-low-latency.patch

Changes:

- Non-inlined the set_current_state()/schedule() code. Saves
  a couple of hundred bytes.

- Simplified some code in dcache.c - the performance benefit
  wasn't worth the ugliness.

- Put a couple of rescheduling points in the /proc files. This
  is because reading /proc/meminfo is part of Benno's test suite :)

- There were some complaints about the decision to disable this
  patch for SMP. So it can be reenabled via CONFIG_LOLAT_SMP (Kernel
  Hacking menu).

  But no promises here at all. It may seem to work, but if you
  start getting heavy spinlock contention in the kernel it
  goes bad.

- Fixed (I believe) the reschedule and signal race in
  arch/i386/kernel/entry.S. This is the only x86-specific
  part of this patch.

  I simply stuck a `cli' in there:

--- linux-2.4.0-test10-pre3/arch/i386/kernel/entry.S Sat Oct 14 17:02:03 2000
+++ linux-akpm/arch/i386/kernel/entry.S Sat Oct 21 22:15:47 2000
@@ -215,21 +215,27 @@
         jne handle_softirq
         
 ret_with_reschedule:
- cmpl $0,need_resched(%ebx)
- jne reschedule
- cmpl $0,sigpending(%ebx)
- jne signal_return
+ cli
+ movl need_resched(%ebx),%eax
+ orl sigpending(%ebx),%eax
+ jne signal_or_resched
 restore_all:
         RESTORE_ALL
 
         ALIGN
-signal_return:
+signal_or_resched:
+ cmpl $0,need_resched(%ebx)
+ jne reschedule
+ # Must be a pending signal
         sti # we can get here from an interrupt handler
         testl $(VM_MASK),EFLAGS(%esp)
         movl %esp,%eax
         jne v86_signal_return
         xorl %edx,%edx
         call SYMBOL_NAME(do_signal)
+ cli
+ cmpl $0,need_resched(%ebx)
+ jne reschedule
         jmp restore_all
 
         ALIGN
@@ -285,6 +291,7 @@
         
         ALIGN
 reschedule:
+ sti
         call SYMBOL_NAME(schedule) # test
         jmp ret_from_sys_call

  On the Mendocino the `cli' adds ~15 cycles to system calls. The
  cunning removal of a conditional jump from the syscall and interrupt
  fastpath reduces this to 13 cycles.

  So `getpid()' now takes 1.005x as long as it used to. So shoot me.

- Some testing results on 2.4.0-test9.

  These tests were run with the separate tcp_minisocks patch because
  the reaping of timed-wait sockets gets in the way with lmbench.
  See http://www.uow.edu.au/~andrewm/linux/schedlat.html#ddt

  Machine: 566 MHz UP Celeron, 256M RAM, UDMA66
  Workload: 2 instances of bonnie++
            2 instances of lmbench
            1 instance of mmap001, mmap002, mmap001, mmap002, ...
            1 instance of netperf
            1 SCHED_FIFO process handling a 1024 Hz signal stream

  After 20 hours the latency histogram was:

  0-1 milliseconds: ~78,000,000
  1-2 milliseconds: 3
  5-8 milliseconds: 8 (kmem_cache_reap->kmem_slab_destroy->avl_remove)
  9-10 milliseconds: 3 (ide_intr)

  The kmem_cache_reap thing is due to the insane number of
  inodes and dentries which bonnie++ leaves around. It doesn't
  happen normally.

  Not sure why the IDE ISR had those glitches.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 21:00:17 EST