Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.

From: Joel Fernandes
Date: Tue Sep 01 2020 - 01:10:19 EST


On Sat, Aug 29, 2020 at 09:47:19AM +0200, peterz@xxxxxxxxxxxxx wrote:
> On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
> > On 8/28/20 4:51 PM, Peter Zijlstra wrote:
>
> > > So where do things go side-ways?
>
> > During hotplug stress test, we have noticed that while a sibling is in
> > pick_next_task, another sibling can go offline or come online. What
> > we have observed is smt_mask get updated underneath us even if
> > we hold the lock. From reading the code, looks like we don't hold the
> > rq lock when the mask is updated. This extra logic was to take care of that.
>
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?

Hi Peter,

I tried again and came up with the simple patch below which handles all
issues and does not cause any more crashes. I added elaborate commit messages
and code comments enlisting all the issues. Hope it makes sense now. IMHO any
other solutions seems unclear or overhead. The simple solution below Just
Works (Tm) and does not add overhead.

Let me know what you think, thanks.

---8<-----------------------