Re: [PATCH for 2.5] preemptible kernel

From: Rusty Russell (rusty@rustcorp.com.au)
Date: Tue Mar 20 2001 - 03:43:50 EST

Next message: Wouter Verhelst: "CDROM and harddisk fighting over DMA"
Previous message: Rusty Russell: "Re: [CHECKER] 9 potential copy_*_user bugs in 2.4.1"
Next in thread: Keith Owens: "Re: [PATCH for 2.5] preemptible kernel"
Reply: Keith Owens: "Re: [PATCH for 2.5] preemptible kernel"
Reply: Nigel Gamble: "Re: [PATCH for 2.5] preemptible kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

In message <Pine.LNX.4.05.10103141653350.3094-100000@cosmic.nrg.org> you write:
> Kernel preemption is not allowed while spinlocks are held, which means
> that this patch alone cannot guarantee low preemption latencies. But
> as long held locks (in particular the BKL) are replaced by finer-grained
> locks, this patch will enable lower latencies as the kernel also becomes
> more scalable on large SMP systems.

Hi Nigel,

I can see three problems with this approach, only one of which
is serious.

The first is code which is already SMP unsafe is now a problem for
everyone, not just the 0.1% of SMP machines. I consider this a good
thing for 2.5 though.

The second is that there are "manual" locking schemes which are used
in several places in the kernel which rely on non-preemptability;
de-facto spinlocks if you will. I consider all these uses flawed: (1)
they are often subtly broken anyway, (2) they make reading those parts
of the code much harder, and (3) they break when things like this are
done.

The third is that preemtivity conflicts with the naive
quiescent-period approach proposed for module unloading in 2.5, and
useful for several other things (eg. hotplugging CPUs). This method
relies on knowing that when a schedule() has occurred on every CPU, we
know noone is holding certain references. The simplest example is a
single linked list: you can traverse without a lock as long as you
don't sleep, and then someone can unlink a node, and wait for a
schedule on every other CPU before freeing it. The non-SMP case is a
noop. See synchonize_kernel() below.

This, too, is soluble, but it means that synchronize_kernel() must
guarantee that each task which was running or preempted in kernel
space when it was called, has been non-preemtively scheduled before
synchronize_kernel() can exit. Icky.

Thoughts?
Rusty.

--
Premature optmztion is rt of all evl. --DK
/* We could keep a schedule count for each CPU and make idle tasks
   schedule (some don't unless need_resched), but this scales quite
   well (eg. 64 processors, average time to wait for first schedule =
   jiffie/64.  Total time for all processors = jiffie/63 + jiffie/62...
   At 1024 cpus, this is about 7.5 jiffies.  And that assumes noone
   schedules early. --RR */
void synchronize_kernel(void)
{
	unsigned long cpus_allowed, policy, rt_priority;
	/* Save current state */
	cpus_allowed = current->cpus_allowed;
	policy = current->policy;
	rt_priority = current->rt_priority;
	/* Create an unreal time task. */
	current->policy = SCHED_FIFO;
	current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO);
	/* Make us schedulable on all CPUs. */
	current->cpus_allowed = (1UL<<smp_num_cpus)-1;
	
	/* Eliminate current cpu, reschedule */
	while ((current->cpus_allowed &= ~(1 << smp_processor_id())) != 0)
		schedule();
	/* Back to normal. */
	current->cpus_allowed = cpus_allowed;
	current->policy = policy;
	current->rt_priority = rt_priority;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Next message: Wouter Verhelst: "CDROM and harddisk fighting over DMA"
Previous message: Rusty Russell: "Re: [CHECKER] 9 potential copy_*_user bugs in 2.4.1"
Next in thread: Keith Owens: "Re: [PATCH for 2.5] preemptible kernel"
Reply: Keith Owens: "Re: [PATCH for 2.5] preemptible kernel"
Reply: Nigel Gamble: "Re: [PATCH for 2.5] preemptible kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Mar 23 2001 - 21:00:14 EST