On Thu, 12 Jun 2008 20:33:23 +0200 Manfred Spraul <manfred@xxxxxxxxxxxxxxxx> wrote:bufctl[0x18] 0x1b instead of 0x1f yields a valid bufctl chain.
Pekka J Enberg wrote:
Hi Andrew,Hmm. double kfree() should be cached by the redzone code.
On Wed, 11 Jun 2008, Andrew Morton wrote:
version is ltp-full-20070228 (lots of retro-computing there).Looking at the above dump, slabp->free is 0x0f and the bufctl it points to is 0xff ("BUFCTL_END") which marks the last element in the chain. This is wrong as the total number of objects in the slab (cachep->num) is 26 but the number of objects in use (slabp->inuse) is 20. So somehow you have managed to lost 6 objects from the bufctl chain.
Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
./testcases/bin/msgctl08 crashes after ten minutes or so:
slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:
000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
080: 10 00 00 00
And I disagree with your link interpretation:
000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010:
inuse: 14 00 00 00 (20 entries in use, 6 should be free)
free: 0f 00 00 00
nodeid: 00 00 00 00
bufctl[0x00] ff ff ff ff 020: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x4] fd ff ff ff 030: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x8] fd ff ff ff 040: fd ff ff ff fd ff ff ff 00 00 00 00
bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff 19 00 00 00
bufctl[0x10] 17 00 00 00 060: fd ff ff ff fd ff ff ff 0b 00 00 00
bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x18] fd ff ff ff 080: 10 00 00 00
free: points to entry 0x0f.
bufctl[0x0f] is 0x19, i.e. it points to entry 0x19
0x19 points to 0x10
0x10 points to 0x17
0x17 is a BUFCTL_ACTIVE - that's a bug.
but: 0x13 is a valid link entry, is points to 0x0b
0x0b points to 0x00, which is BUFCTL_END.
IMHO the most probable bug is a single bit error:
bufctl[0x10] should be 0x13 instead of 0x17.
What about printing all redzone words? That would allow us to validate the bufctl chain.
Andrew: Could you post the new oops?
umm, what new oops?
I have four saved away here:
slab: Internal list corruption detected in cache 'size-96'(32), slabp ea2a5040(28). Hexdump:
000: 20 90 b5 ec 88 54 80 f7 e0 00 00 00 e0 50 2a ea
010: 1c 00 00 00 17 00 00 00 00 00 00 00 fd ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
050: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
060: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
070: fd ff ff ff fd ff ff ff 18 00 00 00 1f 00 00 00
080: fd ff ff ff fd ff ff ff 1c 00 00 00 ff ff ff ffbufctl[0x10]: 0x13 instead of 0x17 creates a valid tree
090: fd ff ff ff fd ff ff ff fd ff ff ff
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP last sysfs file:
slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:
000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
080: 10 00 00 00
slab: Internal list corruption detected in cache 'size-128'(26), slabp f7159000(18). Hexdump:bufctl[0x00] 13 00 00 00 020: fd ff ff ff fd ff ff ff fd ff ff ff
000: 00 f0 f8 f2 88 32 c0 f7 88 00 00 00 88 90 15 f7
010: 12 00 00 00 08 00 00 00 00 00 00 00
slab: Internal list corruption detected in cache 'size-128'(26), slabp ed9a9000(21). Hexdump:bufcfl[0x00] fd ff ff ff 020: fd ff ff ff fd ff ff ff 07 00 00 00
000: 00 c0 3a f3 88 32 80 f7 88 00 00 00 88 90 9a ed
010: 15 00 00 00 12 00 00 00 00 00 00 00
but they're all from under basically the same conditions.All bugs appear to be a spurious 0x04 in a bufctl[nr%8==0].