Re: workqueue cpu affinity

From: Max Krasnyansky
Date: Wed Jun 11 2008 - 16:44:20 EST


Previous emails were very long :). So here is an executive summary of the
discussions so far:

----
Workqueue kthread starvation by non-blocking user RT threads.

Starving workqueue threads on the isolated cpus does not seems like a big
deal. All current mainline users of schedule_on_cpu() kind of api can live
with it. Starvation of the workqueue threads is an issue for the -rt kernels.
See http://marc.info/?l=linux-kernel&m=121316707117552&w=2 for more info.

If absolutely necessary moving workqueue threads from the isolated cpus is
also not a big deal, even for cpu hotplug. It's certainly _not_ encouraged in
general but at the same time is not strictly prohibited either, because
nothing fundamental brakes (that's what my current isolation solution does).

----
Optimize workqueue flush.

Current flush_work_queue() implementation is an issue for the starvation case
mentioned above and in general it's not very efficient because it has to
schedule in each online cpu.

Peter suggested rewriting flush logic to avoid scheduling on each online cpu.

Oleg suggested converting existing users of flush_queued_work() to
cancel_work_sync(work) which will provide fine grained flushing and will not
schedule on each cpu.

Both of the suggestions would improve overall performance and address the case
when machine gets stuck due work queue thread starvation.

----
Timer or IPI based Oprofile.

Currently oprofile collects samples by using schedule_work_on_cpu(). Which
means that if workqueue threads are starved on, or moved from cpuX oprofile
fails to collect samples on that cpuX.

It seems that it can be easily converted to use per-CPU timer or IPI.
This might be useful in general (ie less expensive) and will take care of the
issue described above.

----
Optimize pagevec drain.

Current pavevec drain logic on the NUMA boxes schedules a workqueue on each
online cpu. It's not an issue for the CPU isolation per se but can be improved
in general.
Peter suggested keeping a cpumask of cpus with non-emppty pagevecs which will
not require scheduling work on each cpu.
I wonder if there is something on that front in the Nick's latest patches.
CC'ing Nick.

----

Did I miss anything ?


Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/