Re: SCHED_DEADLINE with CPU affinity

From: Juri Lelli
Date: Wed Nov 20 2019 - 03:50:36 EST


Hi Philipp,

On 19/11/19 23:20, Philipp Stanner wrote:
> Hey folks,
> (please put me in CC when answering, I'm not subscribed)
>
> I'm currently working student in the embedded industry. We have a device where
> we need to be able to process network data within a certain deadline. At the
> same time, safety is a primary requirement; that's why we construct everything
> fully redundant. Meaning: We have two network interfaces, each IRQ then bound
> to one CPU core and spawn a container (systemd-nspawn, cgroups based) which in
> turn is bound to the corresponding CPU (CPU affinity masked).
>
>         Container0       Container1
>    -----------------  -----------------
>    |               |  |               |
>    |    Proc. A    |  |   Proc. A'    |
>    |    Proc. B    |  |   Proc. B'    |
>    |               |  |               |
>    -----------------  -----------------
>           ^                  ^
>           |                  |
>         CPU 0              CPU 1
>           |                  |
>        IRQ eth0           IRQ eth1
>
>
> Within each container several processes are started. Ranging from systemd
> (SCHED_OTHER) till two (soft) real-time critical processes: which we want to
> execute via SCHED_DEADLINE.
>
> Now, I've worked through the manpage describing scheduling policies, and it
> seems that our scenario is forbidden my the kernel.  I've done some tests with
> the syscalls sched_setattr and sched_setaffinity, trying to activate
> SCHED_DEADLINE while also binding to a certain core.  It fails with EINVAL or
> EINBUSY, depending on the order of the syscalls.
>
> I've read that the kernel accomplishes plausibility checks when you ask for a

Yeah, admission control.

> new deadline task to be scheduled, and I assume this check is what prevents us
> from implementing our intended architecture.
>
> Now, the questions we're having are:
>
>    1. Why does the kernel do this, what is the problem with scheduling with
>       SCHED_DEADLINE on a certain core? In contrast, how is it handled when
>       you have single core systems etc.? Why this artificial limitation?

Please have also a look (you only mentioned manpage so, in case you
missed it) at

https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667

and the document in general should hopefully give you the answer about
why we need admission control and current limitations regarding
affinities.

>    2. How can we possibly implement this? We don't want to use SCHED_FIFO,
>       because out-of-control tasks would freeze the entire container.

I experimented myself a bit with this kind of setup in the past and I
think I made it work by pre-configuring exclusive cpusets (similarly as
what detailed in the doc above) and then starting containers inside such
exclusive sets with podman run --cgroup-parent option.

I don't have proper instructions yet for how to do this (plan to put
them together soon-ish), but please see if you can make it work with
this hint.

Best,

Juri