Re: [RFC][PATCH] Improving directed yield scalability for PLE handler

From: Raghavendra K T
Date: Sat Sep 15 2012 - 12:13:38 EST

On 09/14/2012 10:40 PM, Andrew Jones wrote:
On Thu, Sep 13, 2012 at 04:30:58PM -0500, Andrew Theurer wrote:
On Thu, 2012-09-13 at 17:18 +0530, Raghavendra K T wrote:
* Andrew Theurer<habanero@xxxxxxxxxxxxxxxxxx> [2012-09-11 13:27:41]:


On picking a better vcpu to yield to: I really hesitate to rely on
paravirt hint [telling us which vcpu is holding a lock], but I am not
sure how else to reduce the candidate vcpus to yield to. I suspect we
are yielding to way more vcpus than are prempted lock-holders, and that
IMO is just work accomplishing nothing. Trying to think of way to
further reduce candidate vcpus....

wrt to yielding to vcpus for the same cpu, I recently noticed that
there's a bug in yield_to_task_fair. yield_task_fair() calls
clear_buddies(), so if we're yielding to a task that has been running on
the same cpu that we're currently running on, and thus is also on the
current cfs runqueue, then our 'who to pick next' hint is getting cleared
right after we set it.

I had hoped that the patch below would show a general improvement in the
vpu overcommit performance, however the results were variable - no worse,
no better. Based on your results above showing good improvement from
interleaving vcpus across the cpus, then that means there was a decent
percent of these types of yields going on. So since the patch didn't
change much that indicates that the next hinting isn't generally taken
too seriously by the scheduler. Anyway, the patch should correct the
code per its design, and testing shows that it didn't make anything worse,
so I'll post it soon. Also, in order to try and improve how far set-next
can jump ahead in the queue, I tested a kernel with group scheduling
compiled out (libvirt uses cgroups and I'm not sure autogroups may affect
things). I did get slight improvement with that, but nothing to write home
to mom about.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c219bf8..7d8a21d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3037,11 +3037,12 @@ static bool yield_to_task_fair(struct rq *rq, struct task_struct *p, bool preemp
if (!se->on_rq || throttled_hierarchy(cfs_rq_of(se)))
return false;

+ /* We're yielding, so tell the scheduler we don't want to be picked */
+ yield_task_fair(rq);
/* Tell the scheduler that we'd really like pse to run next. */

- yield_task_fair(rq);
return true;

Hi Drew, Agree with your fix and tested the patch too.. results are
pretty much same. puzzled why so.

thinking ... may be we hit this when #vcpu (of a VM) > #pcpu?
(pigeonhole principle ;)).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at