Re: [Problem] Cache line starvation

From: Kurt Kanzenbach
Date: Fri Sep 28 2018 - 11:26:59 EST


On Fri, Sep 28, 2018 at 11:05:21AM +0200, Kurt Kanzenbach wrote:
> Hi Thomas,
>
> On Thu, Sep 27, 2018 at 04:47:47PM +0200, Thomas Gleixner wrote:
> > On Thu, 27 Sep 2018, Kurt Kanzenbach wrote:
> > > On Thu, Sep 27, 2018 at 04:25:47PM +0200, Kurt Kanzenbach wrote:
> > > > However, the issue still triggers fine. With stress-ng we're able to
> > > > generate latency in millisecond range. The only workaround we've found
> > > > so far is to add a "delay" in cpu_relax().
> > >
> > > It might interesting for you, how we added the delay. We've used:
> > >
> > > static inline void cpu_relax(void)
> > > {
> > > volatile int i = 0;
> > >
> > > asm volatile("yield" ::: "memory");
> > > while (i++ <= 1000);
> > > }
> > >
> > > Of course it's not efficient, but it works.
> >
> > I wonder if it's just the store on the stack which makes it work. I've seen
> > that when instrumenting x86. When the careful instrumentation just stayed
> > in registers it failed. Once it was too much and stack got involved it
> > vanished away.
>
> I've performed more tests: Adding a store to a global variable just
> before calling cpu_relax() doesn't help. Furthermore, adding up to 20
> yield instructions (just like you did on x86) didn't work either.

In addition, the stress-ng test triggers on v4.14-rt and v4.18-rt as
well.

As v4.18-rt still uses the old spin lock implementation, I've backported
the qspinlock implementation to v4.18-rt. The commits I've identified
are:

- 598865c5f32d ("arm64: barrier: Implement smp_cond_load_relaxed")
- c11090474d70 ("arm64: locking: Replace ticket lock implementation with qspinlock")

Using these commits it's still possible to trigger the issue. But it
takes longer.

Did I miss anything?

Thanks,
Kurt