Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync

From: qiaozhou
Date: Wed Jul 26 2017 - 21:30:05 EST



On 2017å07æ26æ 22:16, Thomas Gleixner wrote:
On Wed, 26 Jul 2017, qiaozhou wrote:

Cc'ed ARM folks.

I want to ask you for suggestions about how to fix one contention between
expire_timers and try_to_del_timer_sync. Thanks in advance.
The issue is a hard-lockup issue detected on our platform(arm64, one cluster
with 4 a53, and the other with 4 a73). The sequence is as below:
1. core0 checks expired timers, and wakes up a process on a73, in step 1.3.
2. core4 starts to run the process and try to delete the timer in sync method,
in step 2.1.
3. Before core0 can run to 1.4 and get the lock, core4 has already run from
2.1 to 2.6 in while loop. And in step 2.4, it fails since the timer is still
the running timer.

core0(a53): core4(a73):
run_timer_softirq
__run_timers
spin_lock_irq(&base->lock)
expire_timers()
1.1: base->running_timer = timer;
1.2: spin_unlock_irq(&base->lock);
1.3: call_timer_fn(timer, fn, data);
schedule()
1.4: spin_lock_irq(&base->lock); del_timer_sync(&timer)
1.5: back to 1.1 in while loop
2.1: try_to_del_timer_sync()
2.2: get_timer_base()
2.3: spin_lock_irqsave(&base->lock, flags);
2.4: check "base->running_timer != timer"
2.5: spin_unlock_irqrestore(&base->lock, flags);
2.6:cpu_relax(); (back to 2.1 in while loop)
This is horribly formatted.
Sorry for the format. Updated the calling sequence on two cores.

core0(a53): core4(a73):
run_timer_softirq
__run_timers
spin_lock_irq(&base->lock)
expire_timers()
1.1: base->running_timer = timer;
1.2: spin_unlock_irq(&base->lock);
1.3: call_timer_fn(timer, fn, data);
1.4: spin_lock_irq(&base->lock);
1.5: back to 1.1 in while loop

core4(a73):
schedule()
del_timer_sync(&timer)
2.1: try_to_del_timer_sync()
2.2: get_timer_base()
2.3: spin_lock_irqsave(&base->lock, flags);
2.4: check "base->running_timer != timer"
2.5: spin_unlock_irqrestore(&base->lock, flags);
2.6:cpu_relax(); (back to 2.1 in while loop)


Core0 runs @832MHz, and core4 runs @1.8GHz. A73 is also much more powerful in
design than a53. So the actual running is that a73 keeps looping in spin_lock
-> check running_timer -> spin_unlock -> spin_lock ->...., while a53 can't
even succeed to complete one store exclusive instruction, before it can enter
WFE to wait for lock release. It just keeps looping in +30 and+3c in below
asm, due to stxr fails.

So it deadloop, a53 can't jump out ldaxr/stxr loop, while a73 can't pass the
running_timer check. Finally the hard-lockup is triggered.(BTW, the issue
needs a long time to occur.)

<_raw_spin_lock_irq+0x2c>: prfm pstl1strm, [x19]
/<_raw_spin_lock_irq+0x30>: ldaxr w0, [x19]//
//<_raw_spin_lock_irq+0x34>: add w1, w0, #0x10, lsl #12//
//<_raw_spin_lock_irq+0x38>: stxr w2, w1, [x19]//
//<_raw_spin_lock_irq+0x3c>: cbnz w2, 0xffffff80089f52e0
<_raw_spin_lock_irq+0x30>/
<_raw_spin_lock_irq+0x40>: eor w1, w0, w0, ror #16
<_raw_spin_lock_irq+0x44>: cbz w1, 0xffffff80089f530c
<_raw_spin_lock_irq+0x5c>
<_raw_spin_lock_irq+0x48>: sevl
<_raw_spin_lock_irq+0x4c>: wfe
<_raw_spin_lock_irq+0x50>: ldaxrh w2, [x19]
<_raw_spin_lock_irq+0x54>: eor w1, w2, w0, lsr #16
<_raw_spin_lock_irq+0x58>: cbnz w1, 0xffffff80089f52fc
<_raw_spin_lock_irq+0x4c>

The loop on a53 only has 4 instructions, and loop on a73 has ~100
instructions. Still a53 has no chance to store exclusive successfully. It may
be related with ldaxr/stxr cost, core frequency, core number etc.

I have no idea of fixing it in spinlock/unlock implement, so I try to fix it
in timer driver. I prepared a raw patch, not sure it's the correct direction
to solve this issue. Could you help to give some suggestions? Thanks.

From fb8fbfeb8f9f92fdadd9920ce234fd433bc883e1 Mon Sep 17 00:00:00 2001
From: Qiao Zhou <qiaozhou@xxxxxxxxxxxx>
Date: Wed, 26 Jul 2017 20:30:33 +0800
Subject: [PATCH] RFC: timers: try to fix contention between expire_timers and
del_timer_sync

try to fix contention between expire_timers and del_timer_sync by
adding TIMER_WOKEN status, which means that the timer has already
woken up corresponding process.
Timers do a lot more things than waking up a process.

Just because this happens in a particular case for you where a wakeup is
involved this does not mean, that this is generally true.
Yes, you're right. My intention is that as the timer function is executed, maybe we can do something.

And that whole flag business is completely broken. There is a comment in
call_timer_fn() which says:

/*
* It is permissible to free the timer from inside the
* function that is called from it ....

So you _CANNOT_ touch timer after the function returned. It might be gone
already.
You're right. Touching timer here has risk.

For that particular timer case we can clear base->running_timer w/o the
lock held (see patch below), but this kind of

lock -> test -> unlock -> retry

loops are all over the place in the kernel, so this is going to hurt you
sooner than later in some other place.
It's true. This is the way spinlock is used normally and widely in kernel. I'll also ask ARM experts whether we can do something to avoid or reduce the chance of such issue. ARMv8.1 has one single instruction(ldadda) to replace the ldaxr/stxr loop. Hope it can improve and reduce the chance.

Thanks,

tglx

8<------------

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1301,10 +1301,12 @@ static void expire_timers(struct timer_b
if (timer->flags & TIMER_IRQSAFE) {
raw_spin_unlock(&base->lock);
call_timer_fn(timer, fn, data);
+ base->running_timer = NULL;
raw_spin_lock(&base->lock);
} else {
raw_spin_unlock_irq(&base->lock);
call_timer_fn(timer, fn, data);
+ base->running_timer = NULL;
raw_spin_lock_irq(&base->lock);
}
}
It should work for this particular issue and I'll test it. Previously I thought it was unsafe to touch base->running_timer without holding lock.

Thanks a lot for all the suggestions.

Best Regards
Qiao