the attached patch solves an SMP task migration bug in the O(1) scheduler.
the bug is triggered all the time on an 8-way CPU i tested, and there are
some bugreports from dual boxes as well that lock up during bootups.
task migration is a subtle issue. The basic task is to 'send' the
currently executing task over to another CPU, where it continues
execution. The current code in 2.5.3-pre2 is broken, as it has a window in
which it's possible for a ksoftirqd thread to run on two CPUs at once -
causing a lockup.
my solution is to send the task to the other CPU using the
smp_migrate_task() lowlevel-SMP method defined by the SMP architecture,
where it will call back the scheduler via sched_task_migrated(new_task).
it's also possible to move a task from one runqueue to another one without
using cross-CPU messaging, but this increases the overhead of the
scheduler hot-path, schedule_tail() needs to check for some sort of
prev->state == TASK_MIGRATED flag, at least. The patch solves this without
adding overhead to the hot-path.
the patch is also an optimization: the set_cpus_allowed() function used to
switch the current idle thread manually, to initiate a reschedule (due to
locking issues). This 'manual context switching' code is gone now, and
set_cpus_allowed() calls schedule() directly now. This simplifies
set_cpus_allowed() greatly, reducing sched.c's line count by 10.
Ingo
This archive was generated by hypermail 2b29 : Wed Jan 23 2002 - 21:00:36 EST