[PATCH 0/3] patches for stop_machine

From: Hidetoshi Seto
Date: Mon Apr 28 2008 - 21:25:33 EST


Hi Rusty and all,

This is a proposal of minor improvement for kernel/stop_machine.c

[PATCH 1/3] stop_machine: short exit path for if we cannot create enough threads
[PATCH 2/3] stop_machine: add timeout for child thread deployment
[PATCH 3/3] stop_machine: add stopmachine_timeout sysctl entry

The main topic is "how about adding timeout for stop_machine?"
I think it will act as a safety net.

For example (of silly situation), system can hung with following way:

# ./silly.sh
run an evil loop task on AP
pid 6138's current affinity mask: ff
pid 6138's new affinity mask: fe
to pretend lock up, chrt -f -p 99 6138
loop[6138] is on CPU #4
to do stopmachine, try to off #7
echo 0 > /sys/devices/system/cpu/cpu7/online
(never return)

After applying patch set here, it can be prevented.

# ./silly.sh
:
echo 0 > /sys/devices/system/cpu/cpu7/online
stopmachine: Failed to stop machine in time(5s). Are there any CPUs on file?
./silly.sh: line 22: echo: write error: Device or resource busy
offline is failed
OK, kill evil loop[6138]
try to off #7 again
echo 0 > /sys/devices/system/cpu/cpu7/online
CPU #7 is now offline
done!

Please refer description of each patch for the detail.
All comments are welcomed.

Thanks,
H.Seto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/