Re: [Problem] Cache line starvation

From: Peter Zijlstra
Date: Wed Sep 26 2018 - 03:35:19 EST


On Fri, Sep 21, 2018 at 02:02:26PM +0200, Sebastian Andrzej Siewior wrote:
> Instrumentation show always the picture:
>
> CPU0 CPU1
> => do_syscall_64 => do_syscall_64
> => SyS_ptrace => syscall_slow_exit_work
> => ptrace_check_attach => ptrace_do_notify / rt_read_unlock
> => wait_task_inactive rt_spin_lock_slowunlock()
> -> while task_running() __rt_mutex_unlock_common()
> / check_task_state() mark_wakeup_next_waiter()
> | raw_spin_lock_irq(&p->pi_lock); raw_spin_lock(&current->pi_lock);
> | . .
> | raw_spin_unlock_irq(&p->pi_lock); .
> \ cpu_relax() .
> - .
> *IRQ* <lock acquired>
>
> In the error case we observe that the while() loop is repeated more than
> 5000 times which indicates that the pi_lock can be acquired. CPU1 on the
> other side does not make progress waiting for the same lock with interrupts
> disabled.

I've tried really hard to reproduce this in userspace, but so far have
not had any luck. Looks to be a real tricky thing to make happen.