Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync

From: Will Deacon
Date: Fri Jul 28 2017 - 05:28:33 EST


On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
> On 2017-07-26 18:29, qiaozhou wrote:
> >On 2017å07æ26æ 22:16, Thomas Gleixner wrote:
> >>On Wed, 26 Jul 2017, qiaozhou wrote:
> >>For that particular timer case we can clear base->running_timer w/o the
> >>lock held (see patch below), but this kind of
> >>
> >> lock -> test -> unlock -> retry
> >>
> >>loops are all over the place in the kernel, so this is going to hurt you
> >>sooner than later in some other place.
> >It's true. This is the way spinlock is used normally and widely in
> >kernel. I'll also ask ARM experts whether we can do something to avoid
> >or reduce the chance of such issue. ARMv8.1 has one single
> >instruction(ldadda) to replace the ldaxr/stxr loop. Hope it can
> >improve and reduce the chance.
>
> I think we should have this discussion now - I brought this up earlier [1]
> and I promised a test case that I completely forgot about - but here it
> is (attached). Essentially a Big CPU in an acquire-check-release loop
> will have an unfair advantage over a little CPU concurrently attempting
> to acquire the same lock, in spite of the ticket implementation. If the Big
> CPU needs the little CPU to make forward progress : livelock.
>
> We've run into the same loop construct in other spots in the kernel and
> the reason that a real symptom is so rare is that the retry-loop on the
> 'Big'
> CPU needs to be interrupted just once by say an IRQ/FIQ and the live-lock
> is broken. If the entire retry loop is within an interrupt-disabled critical
> section then the odds of live-locking are much higher.
>
> An example of the problem on a previous kernel is here [2]. Changes to the
> workqueue code since may have fixed this particular instance.
>
> One solution was to use udelay(1) in such loops instead of cpu_relax(), but
> that's not very 'relaxing'. I'm not sure if there's something we could do
> within the ticket spin-lock implementation to deal with this.

Does bodging cpu_relax to back-off to wfe after a while help? The event
stream will wake it up if nothing else does. Nasty patch below, but I'd be
interested to know whether or not it helps.

Will

--->8

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 64c9e78f9882..1f5a29c8612e 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,9 +149,11 @@ extern void release_thread(struct task_struct *);

unsigned long get_wchan(struct task_struct *p);

+void __cpu_relax(unsigned long pc);
+
static inline void cpu_relax(void)
{
- asm volatile("yield" ::: "memory");
+ __cpu_relax(_THIS_IP_);
}

/* Thread switching */
diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c
index 67368c7329c0..be8a698ea680 100644
--- a/arch/arm64/kernel/arm64ksyms.c
+++ b/arch/arm64/kernel/arm64ksyms.c
@@ -72,6 +72,8 @@ EXPORT_SYMBOL(_mcount);
NOKPROBE_SYMBOL(_mcount);
#endif

+EXPORT_SYMBOL(__cpu_relax);
+
/* arm-smccc */
EXPORT_SYMBOL(__arm_smccc_smc);
EXPORT_SYMBOL(__arm_smccc_hvc);
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 659ae8094ed5..c394c3704b7f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -403,6 +403,31 @@ unsigned long get_wchan(struct task_struct *p)
return ret;
}

+static DEFINE_PER_CPU(u64, __cpu_relax_data);
+
+#define CPU_RELAX_WFE_THRESHOLD 10000
+void __cpu_relax(unsigned long pc)
+{
+ u64 new, old = raw_cpu_read(__cpu_relax_data);
+ u32 old_pc, new_pc;
+ bool wfe = false;
+
+ old_pc = (u32)old;
+ new = new_pc = (u32)pc;
+
+ if (old_pc == new_pc) {
+ if ((old >> 32) > CPU_RELAX_WFE_THRESHOLD) {
+ asm volatile("sevl; wfe; wfe\n" ::: "memory");
+ wfe = true;
+ } else {
+ new = old + (1UL << 32);
+ }
+ }
+
+ if (this_cpu_cmpxchg(__cpu_relax_data, old, new) == old && !wfe)
+ asm volatile("yield" ::: "memory");
+}
+
unsigned long arch_align_stack(unsigned long sp)
{
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)