Re: [RFC][PATCH 00/16] sched: Core scheduling

From: Greg Kerr
Date: Tue Feb 19 2019 - 17:07:16 EST


Thanks for posting this patchset Peter. Based on the patch titled, "sched: A
quick and dirty cgroup tagging interface," I believe cgroups are used to
define co-scheduling groups in this implementation.

Chrome OS engineers (kerrnel@xxxxxxxxxx, mpdenton@xxxxxxxxxx, and
palmer@xxxxxxxxxx) are considering an interface that is usable by unprivileged
userspace apps. cgroups are a global resource that require privileged access.
Have you considered an interface that is akin to namespaces? Consider the
following strawperson API proposal (I understand prctl() is generally
used for process
specific actions, so we aren't married to using prctl()):

# API Properties

The kernel introduces coscheduling groups, which specify which processes may
be executed together. An unprivileged process may use prctl() to create a
coscheduling group. The process may then join the coscheduling group, and
place any of its child processes into the coscheduling group. To
provide flexibility for
unrelated processes to join pre-existing groups, an IPC mechanism could send a
coscheduling group handle between processes.

# Strawperson API Proposal
To create a new coscheduling group:
int coscheduling_group = prctl(PR_CREATE_COSCHEDULING_GROUP);

The return value is >= 0 on success and -1 on failure, with the following
possible values for errno:

ENOTSUP: This kernel doesnât support the PR_NEW_COSCHEDULING_GROUP
operation.
EMFILE: The processâ kernel-side coscheduling group table is full.

To join a given process to the group:
pid_t process = /* self or child... */
int status = prctl(PR_JOIN_COSCHEDULING_GROUP, coscheduling_group, process);
if (status) {
err(errno, NULL);
}

The kernel will check and enforce that the given process ID really is the
callerâs own PID or a PID of one of the callerâs children, and that the given
group ID really exists. The return value is 0 on success and -1 on failure,
with the following possible values for errno:

EPERM: The caller could not join the given process to the coscheduling
group because it was not the creator of the given coscheduling group.
EPERM: The caller could not join the given process to the coscheduling
group because the given process was not the caller or one
of the callerâs
children.
EINVAL: The given group ID did not exist in the kernel-side coscheduling
group table associated with the caller.
ESRCH: The given process did not exist.

Regards,

Greg Kerr (kerrnel@xxxxxxxxxx)

On Mon, Feb 18, 2019 at 9:40 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>
> A much 'demanded' feature: core-scheduling :-(
>
> I still hate it with a passion, and that is part of why it took a little
> longer than 'promised'.
>
> While this one doesn't have all the 'features' of the previous (never
> published) version and isn't L1TF 'complete', I tend to like the structure
> better (relatively speaking: I hate it slightly less).
>
> This one is sched class agnostic and therefore, in principle, doesn't horribly
> wreck RT (in fact, RT could 'ab'use this by setting 'task->core_cookie = task'
> to force-idle siblings).
>
> Now, as hinted by that, there are semi sane reasons for actually having this.
> Various hardware features like Intel RDT - Memory Bandwidth Allocation, work
> per core (due to SMT fundamentally sharing caches) and therefore grouping
> related tasks on a core makes it more reliable.
>
> However; whichever way around you turn this cookie; it is expensive and nasty.
>
> It doesn't help that there are truly bonghit crazy proposals for using this out
> there, and I really hope to never see them in code.
>
> These patches are lightly tested and didn't insta explode, but no promises,
> they might just set your pets on fire.
>
> 'enjoy'
>
> @pjt; I know this isn't quite what we talked about, but this is where I ended
> up after I started typing. There's plenty design decisions to question and my
> changelogs don't even get close to beginning to cover them all. Feel free to ask.
>
> ---
> include/linux/sched.h | 9 +-
> kernel/Kconfig.preempt | 8 +-
> kernel/sched/core.c | 762 ++++++++++++++++++++++++++++++++++++++++++++---
> kernel/sched/deadline.c | 99 +++---
> kernel/sched/debug.c | 4 +-
> kernel/sched/fair.c | 129 +++++---
> kernel/sched/idle.c | 42 ++-
> kernel/sched/pelt.h | 2 +-
> kernel/sched/rt.c | 96 +++---
> kernel/sched/sched.h | 183 ++++++++----
> kernel/sched/stop_task.c | 35 ++-
> kernel/sched/topology.c | 4 +-
> kernel/stop_machine.c | 2 +
> 13 files changed, 1096 insertions(+), 279 deletions(-)
>
>