[PATCHSET] blkcg: unify blkgs for different policies

From: Tejun Heo
Date: Wed Feb 01 2012 - 16:28:20 EST

Hey, again.

Currently, blkcg policies have and manage their own blkgs, so blkgs
are per cgroup-queue-policy combination instead of cgroup-queue
combination. This leads to nasty problems. It isn't clear which part
are common to both policies. There are unused duplicates in common
part of blkg and it isn't trivial to tell which part is being used.
The separation also leads to duplicate logic in both policies which
makes the code difficult to follow, prone to subtle bugs and, most
importantly, hinders proper layering between blkcg core and policy

Because locking, blkg management, elvswitch and policy
[de]registration are tightly woven, it is challenging to untangle -
doing proper in-place policy data replacement requires locking
improvements which in turn is painful to do when policy
implementations are doing their own things with blkgs.

As a transitional step, all blkgs other than root one are shot down on
policy [de]registration and root blkg is updated in place. This is
hackish but should get us through locking update after which we can
implement in-place update for all blkgs safely. While this does
introduce race window while policies are being [de]registered, this
isn't anything new (e.g. none of stat update functions synchronize
against policy update) and shouldn't cause any actual problem given
blk-throttle can't be built as module and cfq-iosched is default
iosched on most installations.

This patchset was pretty painful but I think/hope things will be
eaiser from here on. Note that this patchset does add ~180 LOC. Some
of them are comments and it's expected to shrink again with further
cleanups and removal of transitional stuff.

Changes to come are:

* locking simplification

* proper in-place update of policy data for all blkgs on policy

* fix broken blkcg switch after throttling.

* use unified stats updated under queue lock and drop percpu stats
which should fix locking / context bug across percpu allocation.

* make set of applied policies per-queue

* move stats and conf into their owning policies and let blkcg core
provide generic framework / helper instead of hard coding all the
possible ones. This should be accompanied by cgroup updates to
allow changing files in cgroupfs. Not sure how this will turn out

This patchset contains the following 11 patches.


0001-0003 are prep patches.

0004 shoots down all non-root blkgs on policy [de]registration.

0005 separates per-policy data from common part of blkg and allocate
them separately from blkcg core.

0006-0010 collect common data fields and logic from policy
implementations into blkcg core.

0011 unifies blkgs so that there's one blkg per cgroup-queue pair.

This patchset is on top of

v3.3-rc2 62aa2b537c6f5957afd98e29f96897419ed5ebab
+ [1] blkcg: kill policy node and blkg->dev, take#4

and is also available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-unified-blkg

diffstat follows.

block/blk-cgroup.c | 674 +++++++++++++++++++++++++++++++++++--------------
block/blk-cgroup.h | 211 +++++++++++----
block/blk-core.c | 26 +
block/blk-sysfs.c | 6
block/blk-throttle.c | 232 +++-------------
block/blk.h | 2
block/cfq-iosched.c | 274 +++++--------------
block/cfq.h | 96 ++++--
block/elevator.c | 2
include/linux/blkdev.h | 7
10 files changed, 853 insertions(+), 677 deletions(-)



[1] http://thread.gmane.org/gmane.linux.kernel/1247152
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/