[RFC][PATCH RT 0/4] sched/rt: Lower rq lock contention latencies on many CPU boxes

From: Steven Rostedt
Date: Fri Dec 07 2012 - 19:09:26 EST

Next message: Steven Rostedt: "[RFC][PATCH RT 3/4] sched/rt: Use IPI to trigger RT task push migration instead of pulling"
Previous message: Steven Rostedt: "[RFC][PATCH RT 1/4] sched/rt: Fix push_rt_task() to have the same checks as the caller did"
Next in thread: Steven Rostedt: "[RFC][PATCH RT 2/4] sched/rt: Try to migrate task if preempting pinned rt task"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I've been debugging large latencies on a 40 core box and found a major
cause due to the thundering herd like grab of the rq lock due to the
pull_rt_task() logic.

Basically, if a large number of CPUs were to lower its priority roughly
the same time, they would all trigger a pull. If there happens to be
only one CPU available to get a task, all CPUs doing the pull will try
to grab it. In doing so, they will all contend on the rq lock of
the overloaded CPU. Only one CPU will succeed in pulling the task
and unfortunately, there's no quick way to know which, as it's dependent
on the affinitiy of the task that needs to be pulled, and to look at that,
we need to grab its rq lock!

Instead of having the pull logic grab the rq locks and do the work to
switch the task over to the pulling CPU, this patch series (well patch
#3) has the pulling CPU send an IPI to the overloaded CPU and that
CPU will do the push instead. The push logic uses the cpupri.c code
to quickly find the best CPU to offload the overloaded RT task to, so
it makes it quite efficient to do this.

Retrieving multiple IPIs has a much lower overhead than all the CPUs
grabbing the rq lock.

The other three patches are fixes/enhancements to the push/pull code
that I found while doing the debugging of the latencies.

Note, although this patch series is made for the -rt patch, the issues
apply to mainline as well. But because -rt has the migrate_disable() code,
this patch series is tailored to that. But if we can vet this out in
-rt, all this code should make its way quickly to mainline.

I tested this code out, but it probably needs some clean up and definitely
more comments. I'm only posting this as an RFC for now to get feedback
on the idea.

Thanks!

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Steven Rostedt: "[RFC][PATCH RT 3/4] sched/rt: Use IPI to trigger RT task push migration instead of pulling"
Previous message: Steven Rostedt: "[RFC][PATCH RT 1/4] sched/rt: Fix push_rt_task() to have the same checks as the caller did"
Next in thread: Steven Rostedt: "[RFC][PATCH RT 2/4] sched/rt: Try to migrate task if preempting pinned rt task"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]