Re: [PATCH v2 for-4.12-fixes 1/2] sched/fair: Use task_groups instead of leaf_cfs_rq_list to walk all cfs_rqs

From: Tim Chen
Date: Wed May 24 2017 - 19:40:53 EST




On 05/09/2017 09:17 AM, Tejun Heo wrote:
Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all
live cfs_rqs which have ever been active on the CPU; unfortunately,
this makes update_blocked_averages() O(total number of CPU cgroups)
which isn't scalable at all.

The next patch will make rq->leaf_cfs_rq_list only contain the cfs_rqs
which are currently active. In preparation, this patch converts users
which need to traverse all cfs_rqs to use task_groups list instead.

task_groups list is protected by its own lock and allows RCU protected
traversal and the order of operations guarantees that all online
cfs_rqs will be visited, but holding rq->lock won't protect against
iterating an already unregistered cfs_rq. However, the operations of
the two users that get converted - update_runtime_enabled() and
unthrottle_offline_cfs_rqs() - should be safe to perform on already
dead cfs_rqs, so adding rcu read protection around them should be
enough.

Note that print_cfs_stats() is not converted. The next patch will
change its behavior to print out only active cfs_rqs, which is
intended as there's not much point in printing out idle cfs_rqs.

v2: Dropped strong synchronization around removal and left
print_cfs_stats() unchanged as suggested by Peterz.



Tejun,

We did some preliminary testing of this patchset for a well
known database benchmark on a 4 socket Skylake server system.
It provides a 3.7% throughput boost which is significant for
this benchmark.

Thanks.

Tim