Re: OOPS with slab cache

From: Andreas Dilger (adilger@home.com)
Date: Tue Jan 11 2000 - 11:59:57 EST


tigran writes:
> You said that you get an oops because of dereferencing cache name via
> /proc/slabinfo even though you destroy the cache with
> kmem_cache_destroy()?
>
> First of all, as far as I remember, Mark Hemment (author of slab) put the
> comment about not destroying the cache *before* Andrea Arcangeli added the
> destroy functioniality so your observation about it doesn't indicate that
> there should be a problem with kmem_cache_destroy()

Actually, the OOPS has nothing to with my code. Try loading any module
that uses a slab cache (nfs is a good example). Unload the module and
cat /proc/slabinfo - OOPS. Soon your machine will hang (I think it happens
when another slab is added to the cache). It is sometimes not possible to
reload the module. The reason for the hang is that the "name" field is
in the memory space of the module, and it isn't there when the module is
unloaded, hence "cat /proc/slabinfo" gets an OOPS when sprintf'ing the name.

This is one problem which I already have a patch for (limit 20 character
names, which is long enough for all of the caches in the current kernel
code - and there is now a length check).

> I think your idea of writing a kmem_cache_find_byname() is not a very good
> one, but unrelated to this problem. (why is it not good? because slab
> allows to allocate multiple caches under the same name, although it prints
> a warning about it).

When can you use multiple caches of the same name? Either it is a name
conflict, which is one issue that can be solved, or you are repeatedly
allocating the same cache. Try

modprobe nfs; rmmod nfs; insmod nfs; cat /proc/slabinfo

Now you have two sets of all the nfs slab caches, the old one never being
used. How bad will it get if you have auto-load and -unload of a module
which is using a cache? In my patch, you will get the same cache back if
all of your allocate parameters are the same (name, size, offset, flags,
ctor, and dtor). However, even though I verified the cache pointer was
returned the same, the first slab allocation failed in my test. It may be
that there is still a bug in my patch, which I haven't tested thoroughly
yet (this part is not included here).

> I think the problem here is not that destroy does not work (it does work!)
> but that you probably tried to destroy the cache with some objects still
> allocated from it. So, kmem_cache_destroy() tried to __kmem_cache_shrink()
> it but it couldn't, so, if you had checked the return from
> kmem_cache_destroy() you would probably have found that the above is the
> case.

Actually, I tried both shrink and destroy without success. If destroy does
work, then this may be a solution. However, I have a hard time finding any
usage of kmem_cache_destroy()... USB driver uhci gets around the "name
pointing to unloaded memory" by kmallocing 16 bytes for the name, but it still
is using shrink instead of destroy, so someone with USB could probably test
and see if it creates a new slab cache each time it is loaded and unloaded.

Cheers, Andreas
--- cut here ---
--- slab.c Mon Jan 10 21:33:05 2000
+++ slab.c.new Tue Jan 11 09:48:55 2000
@@ -216,6 +216,8 @@
 
 #endif /* SLAB_DEBUG_SUPPORT */
 
+#define SLAB_CACHE_NAME_LEN 20 /* max name length for a slab cache */
+
 /* Cache struct - manages a cache.
  * First four members are commonly referenced during an alloc/free operation.
  */
@@ -241,7 +243,7 @@
         size_t c_colour; /* cache colouring range */
         size_t c_colour_next;/* cache colouring */
         unsigned long c_failures;
- const char *c_name;
+ char c_name[SLAB_CACHE_NAME_LEN];
         struct kmem_cache_s *c_nextp;
         kmem_cache_t *c_index_cachep;
 #if SLAB_STATS
@@ -674,7 +676,6 @@
 /* Create a cache:
  * Returns a ptr to the cache on success, NULL on failure.
  * Cannot be called within a int, but can be interrupted.
- * NOTE: The 'name' is assumed to be memory that is _not_ going to disappear.
  */
 kmem_cache_t *
 kmem_cache_create(const char *name, size_t size, size_t offset,
@@ -694,6 +695,10 @@
                 printk("%sNULL ptr\n", func_nm);
                 goto opps;
         }
+ if (strlen(name) >= SLAB_CACHE_NAME_LEN) {
+ printk("%sname too long\n", func_nm);
+ goto opps;
+ }
         if (in_interrupt()) {
                 printk("%sCalled during int - %s\n", func_nm, name);
                 goto opps;
@@ -955,7 +960,8 @@
         cachep->c_ctor = ctor;
         cachep->c_dtor = dtor;
         cachep->c_magic = SLAB_C_MAGIC;
- cachep->c_name = name; /* Simply point to the name. */
+ /* Copy name over so we don't have problems with unloaded modules */
+ strcpy(cachep->c_name, name);
         spin_lock_init(&cachep->c_spinlock);
 
         /* Need the semaphore to access the chain. */
--- cut here ---

-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:18 EST