Re: smp_call_function_single lockups

From: Linus Torvalds
Date: Wed Feb 11 2015 - 13:18:51 EST


On Wed, Feb 11, 2015 at 5:19 AM, Rafael David Tinoco <inaddy@xxxxxxxxxx> wrote:
>
> - After applying patch provided by Thomas we were able to cause the
> lockup only after 6 days (also locked inside
> smp_call_function_single). Test performance (even for a nested kvm)
> was reduced substantially with 3.19 + this patch.

I think that just means that the patch from Thomas doesn't change
anything - the reason it takes longer to lock up is just that
performance reduction, so whatever race it is that causes the problem
was just harder to hit, but not fundamentally affected.

I think a more interesting thing to get is the traces from the other
CPU's when this happens. In a virtualized environment, that might be
easier to get than on real hardware, and if you are able to reproduce
this at will - especially with something recent like 3.19, and could
get that, that would be really good.

I'll think about this all, but we couldn't figure anything out last
time we looked at it, so without more clues, don't hold your breath.

That said, it *would* be good if we could get rid of the synchronous
behavior entirely, and make it a rule that if somebody wants to wait
for it, they'll have to do their own waiting. Because I still think
that that CSD_FLAG_WAIT is pure and utter garbage. And I think that
Jens said that it is probably bogus to begin with.

I also don't even see where the CSD_FLAG_WAIT bit woudl ever be
cleared, so it all looks completely buggy anyway.

Does this (COMPLETELY UNTESTED!) attached patch change anything?

Linus
kernel/smp.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f38a1e692259..13a8e75e1379 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -19,7 +19,6 @@

enum {
CSD_FLAG_LOCK = 0x01,
- CSD_FLAG_WAIT = 0x02,
};

struct call_function_data {
@@ -114,26 +113,24 @@ static void csd_lock_wait(struct call_single_data *csd)
static void csd_lock(struct call_single_data *csd)
{
csd_lock_wait(csd);
- csd->flags |= CSD_FLAG_LOCK;
+ csd->flags = CSD_FLAG_LOCK;

/*
* prevent CPU from reordering the above assignment
* to ->flags with any subsequent assignments to other
* fields of the specified call_single_data structure:
*/
- smp_mb();
+ smp_wmb();
}

static void csd_unlock(struct call_single_data *csd)
{
- WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
+ WARN_ON(!(csd->flags & CSD_FLAG_LOCK));

/*
* ensure we're all done before releasing data:
*/
- smp_mb();
-
- csd->flags &= ~CSD_FLAG_LOCK;
+ smp_store_release(&csd->flags, 0);
}

static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data);
@@ -173,9 +170,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
csd->func = func;
csd->info = info;

- if (wait)
- csd->flags |= CSD_FLAG_WAIT;
-
/*
* The list addition should be visible before sending the IPI
* handler locks the list to pull the entry off it because of