Re: [PATCH 0/4] sched/rt: Distribute tasks in find_lowest_rq()

From: Vincent Guittot
Date: Tue Apr 21 2020 - 09:28:31 EST


On Tue, 21 Apr 2020 at 15:18, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
>
> On 21/04/20 13:13, Qais Yousef wrote:
> > On 04/14/20 19:58, Valentin Schneider wrote:
> >>
> >> I'm a bit wary about such blanket changes. I feel like most places impacted
> >> by this change don't gain anything by using the random thing. In sched land
> >> that would be:
> >
> > The API has always been clear that cpumask_any return a random cpu within the
> > mask. And the fact it's a one liner with cpumask_first() directly visible,
> > a user made the choice to stick to cpumask_any() indicates that that's what
> > they wanted.
> >
> > Probably a lot of them they don't care what cpu is returned and happy with the
> > random value. I don't see why it has to have an effect. Some could benefit,
> > like my use case here. Or others truly don't care, then it's fine to return
> > anything, as requested.
> >
>
> Exactly, *some* (which AFAICT is a minority) might benefit. So why should
> all the others pay the price for a functionality they do not need?
>
> I don't think your change would actually cause a splat somewhere; my point
> is about changing existing behaviour without having a story for it. The
> thing said 'pick a "random" cpu', sure, but it never did that, it always
> picked the first.
>
> I've pointed out two examples that want to be cpumask_first(), and I'm
> absolutely certain there are more than these two out there. What if folks
> ran some performance test and were completely fine with the _first()
> behaviour? What tells you randomness won't degrade some cases?

I tend to agree that any doesn't mean random and using a random cpu
will create strange behavior

One example is the irq affinity on b.L system. Right now, the irq are
always pinned to the same CPU (the 1st one which is most probably a
Little) but with your change we can imagine that this will change and
might ever change over 2 consecutives boot if for whatever reason (and
this happen) the drivers are not probed in the same order . At the end
you will run some tests with irq on little and other time irq on big.
And more generally speaking and a SMP system can be impacted because
the irq will not be pinned to the same CPU with always the same other
irqs

>
> IMO the correct procedure is to keep everything as it is and improve the
> specific callsites that benefit from randomness. I get your point that

I agree with this point

> using cpumask_any() should be a good enough indicator of the latter, but I
> don't think it can realistically be followed. To give my PoV, if in the
> past someone had used a cpumask_any() where a cpumask_first() could do, I
> would've acked it (disclaimer: super representative population of sample
> size = 1).
>
> Flipping the switch on everyone to then have a series of patches "oh this
> one didn't need it", "this one neither", "I actually need this to be the
> first" just feels sloppy.
>
> > I CCed Marc who's the maintainer of this file who can clarify better if this
> > really breaks anything.
> >
> > If any interrupt expects to be affined to a specific CPU then this must be
> > described in DT/driver. I think the GIC controller is free to distribute them
> > to any cpu otherwise if !force. Which is usually done by irq_balancer anyway
> > in userspace, IIUC.
> >
> > I don't see how cpumask_any_and() break anything here too. I actually think it
> > improves on things by better distribute the irqs on the system by default.
> >
>
> As you say, if someone wants smarter IRQ affinity they can do irq_balancer
> and whatnot. The default kernel policy for now has been to shove everything
> on the lowest-numbered CPU, and I see no valid reason to change that.