Re: Endless soft-lockups for compiling workload since next-20200519

From: Peter Zijlstra
Date: Thu May 21 2020 - 06:50:13 EST


On Thu, May 21, 2020 at 11:39:39AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote:

> This:
>
> > smp_call_function_single_async() { smp_call_function_single_async() {
> > // verified csd->flags != CSD_LOCK // verified csd->flags != CSD_LOCK
> > csd->flags = CSD_LOCK csd->flags = CSD_LOCK
>
> concurrent smp_call_function_single_async() using the same csd is what
> I'm looking at as well.

So something like this ought to cure the fundamental problem and make
smp_call_function_single_async() more user friendly, but also more
expensive.

The problem is that while the ILB case is easy to fix, I can't seem to
find an equally nice solution for the ttwu_remote_queue() case; that
would basically require sticking the wake_csd in task_struct, I'll also
post that.

So it's either this:

---
kernel/smp.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 84303197caf9..d1ca2a2d1cc7 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -109,6 +109,12 @@ static __always_inline void csd_lock_wait(call_single_data_t *csd)
smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
}

+/*
+ * csd_lock() can use non-atomic operations to set CSD_FLAG_LOCK because it's
+ * users are careful to only use CPU-local data. IOW, there is no cross-cpu
+ * lock usage. Also, you're not allowed to use smp_call_function*() from IRQs,
+ * and must be extra careful from SoftIRQ.
+ */
static __always_inline void csd_lock(call_single_data_t *csd)
{
csd_lock_wait(csd);
@@ -318,7 +324,7 @@ EXPORT_SYMBOL(smp_call_function_single);

/**
* smp_call_function_single_async(): Run an asynchronous function on a
- * specific CPU.
+ * specific CPU.
* @cpu: The CPU to run on.
* @csd: Pre-allocated and setup data structure
*
@@ -339,18 +345,23 @@ EXPORT_SYMBOL(smp_call_function_single);
*/
int smp_call_function_single_async(int cpu, call_single_data_t *csd)
{
+ unsigned int csd_flags;
int err = 0;

preempt_disable();

- if (csd->flags & CSD_FLAG_LOCK) {
+ /*
+ * Unlike the regular smp_call_function*() APIs, this one is actually
+ * usable from IRQ context, also the -EBUSY return value suggests
+ * it is safe to share csd's.
+ */
+ csd_flags = READ_ONCE(csd->flags);
+ if (csd_flags & CSD_FLAG_LOCK ||
+ cmpxchg(&csd->flags, csd_flags, csd_flags | CSD_FLAG_LOCK) != csd_flags) {
err = -EBUSY;
goto out;
}

- csd->flags = CSD_FLAG_LOCK;
- smp_wmb();
-
err = generic_exec_single(cpu, csd, csd->func, csd->info);

out: