Re: [PATCH 4/6] nohz: support PR_DATAPLANE_QUIESCE

From: Ingo Molnar
Date: Tue May 12 2015 - 08:54:20 EST



* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> > So if then a prctl() (or other system call) could be a shortcut
> > to:
> >
> > - move the task to an isolated CPU
> > - make sure there _is_ such an isolated domain available
> >
> > I.e. have some programmatic, kernel provided way for an
> > application to be sure it's running in the right environment.
> > Relying on random administration flags here and there won't cut
> > it.
>
> No, we already have sched_setaffinity() and we should not duplicate
> its ability to move tasks about.

But sched_setaffinity() does not guarantee isolation - it's just a
syscall to move a task to a set of CPUs, which might be isolated or
not.

What I suggested is that it might make sense to offer a system call,
for example a sched_setparam() variant, that makes such guarantees.

Say if user-space does:

ret = sched_setscheduler(0, BIND_ISOLATED, &isolation_params);

... then we would get the task moved to an isolated domain and get a 0
return code if the kernel is able to do all that and if the current
uid/namespace/etc. has the required permissions and such.

( BIND_ISOLATED will not replace the current p->policy value, so it's
still possible to use the regular policies as well on top of this. )

I.e. make it programatic instead of relying on a fragile, kernel
version dependent combination of sysctl, sysfs, kernel config and boot
parameter details to get us this result.

I.e. provide a central hub to offer this feature in a more structured,
easier to use fashion.

We might still require the admin (or distro) to separately set up the
domain of isolated CPUs, and it would still be possible to simply
'move' tasks there using existing syscalls - but I say that it's not a
bad idea at all to offer a single central syscall interface for apps
to request such treatment.

> What this is about is 'clearing' CPU state, its nothing to do with
> tasks.
>
> Ideally we'd never have to clear the state because it should be
> impossible to get into this predicament in the first place.

That I absolutely agree about, that bit is nonsense.

We might offer debugging facilities to debug such bugs, but we won't
work or hack it around.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/