Re: [PATCH] mm/slub: fix lockups on PREEMPT && !SMP kernels

From: Mark Rutland
Date: Tue Mar 17 2015 - 08:01:39 EST


Hi,

> On Fri, Mar 13, 2015 at 03:47:12PM +0000, Mark Rutland wrote:
> > Commit 9aabf810a67cd97e ("mm/slub: optimize alloc/free fastpath by
> > removing preemption on/off") introduced an occasional hang for kernels
> > built with CONFIG_PREEMPT && !CONFIG_SMP.
> >
> > The problem is the following loop the patch introduced to
> > slab_alloc_node and slab_free:
> >
> > do {
> > tid = this_cpu_read(s->cpu_slab->tid);
> > c = raw_cpu_ptr(s->cpu_slab);
> > } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));
> >
> > GCC 4.9 has been observed to hoist the load of c and c->tid above the
> > loop for !SMP kernels (as in this case raw_cpu_ptr(x) is compile-time
> > constant and does not force a reload). On arm64 the generated assembly
> > looks like:
> >
> > ffffffc00016d3c4: f9400404 ldr x4, [x0,#8]
> > ffffffc00016d3c8: f9400401 ldr x1, [x0,#8]
> > ffffffc00016d3cc: eb04003f cmp x1, x4
> > ffffffc00016d3d0: 54ffffc1 b.ne ffffffc00016d3c8 <slab_alloc_node.constprop.82+0x30>
> >
> > If the thread is preempted between the load of c->tid (into x1) and tid
> > (into x4), and and allocation or free occurs in another thread (bumping
> > the cpu_slab's tid), the thread will be stuck in the loop until
> > s->cpu_slab->tid wraps, which may be forever in the absence of
> > allocations on the same CPU.
>
> Is there any method to guarantee refetching these in each loop?

We can use READ_ONCE(c->tid), e.g.

while (IS_ENABLED(CONFIG_PREEMPT) &&
unlikely(tid != READ_ONCE(c->tid));

I will send a patch to that effect.

I previously thought that READ_ONCE wasn't guaranteed to be atomic, and
thought it could return torn values (even for a single load
instruction). I now understand that this is not the case, and a
READ_ONCE will be sufficient.

[...]

> If c->tid, c->freelist, c->page are fetched on the other cpu,
> there is no ordering guarantee and c->freelist, c->page could be stale
> value even if c->tid is recent one.

Ah. Good point.

> Think about following free case with your patch.
>
> Assume initial cpu 0's state as following.
> c->tid: 1, c->freelist: NULL, c->page: A
>
> User X: try to free object X for page A
> User X: fetch c (s->cpu_slab)
>
> Preemtion and migration happens...
> The other allocation/free happens... so cpu 0's state is as following.
> c->tid: 3, c->freelist: NULL, c->page: B
>
> User X: read c->tid: 3, c->freelist: NULL, c->page A (stale value)
>
> Because tid and freelist are matched with current ones, free would
> succeed, but, current c->page is B and object is for A so this success
> is wrong.

Thanks for the example; it's extremely helpful!

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/