[Patch] Day-one race in slab.c

Stephen C. Tweedie (sct@redhat.com)
Wed, 4 Nov 1998 22:10:30 GMT


Hi,

For a few days I've been chasing what looked like a skbuff buf in all
recent 2.1 kernels. The symptom was repeated

kmem_free: NULL ptr (objp=c009e928, name=unknown)

when doing a kmem_cache_reap() on the skbuff_head_cache. I think I've
finally traced it to a long-time bug in the slab cache itself. The
problem only occurs if an interrupt slab allocation hits a race in the
cache reaping, which is why it seems to be quite rare. On a 16mb test
box, I cannot reproduce the problem, but on 8mb, an NFS build will hit
it reliably in 5 to 10 minutes.

The problem is at the end of kmem_slab_destroy: we destroy the slab data
before destroying the optional management and index structures
associated with the slab. Unfortnately, if the slab is one of the
standard small-object slabs which include the management structure
within the slab page, deallocating the slab also destroys the slabp
object, and when immediately afterwards we check slabp->s_index to see
if the index needs to be freed, we can pick up new, bogus data if the
page has been reused.

I'm not sure whether we can ever get a more serious oops from this
problem, but if we can, it should be quite rare, only hurting us if we
have a cache with separate slab indexes but embeded management
structures. Any slab with no index will merely result in the above
kmem_free warning as kmem_freepages() will be passed a null pointer from
cachep->c_index_cachep.

Patch for your pleasure:
----------------------------------------------------------------
--- mm/slab.c.~1~ Wed Nov 4 10:31:42 1998
+++ mm/slab.c Wed Nov 4 20:07:07 1998
@@ -650,9 +658,9 @@
}

slabp->s_magic = SLAB_MAGIC_DESTROYED;
- kmem_freepages(cachep, slabp->s_mem-slabp->s_offset);
if (slabp->s_index)
kmem_cache_free(cachep->c_index_cachep, slabp->s_index);
+ kmem_freepages(cachep, slabp->s_mem-slabp->s_offset);
if (SLAB_OFF_SLAB(cachep->c_flags))
kmem_cache_free(cache_slabp, slabp);
}
----------------------------------------------------------------

I've had the NFS build test running for an hour and a half with this
patch applied, and can no longer reproduce the problem.

Cheers,
Stephen.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/