Re: [Bug #13112] Oops in drain_array

From: Bart
Date: Mon Apr 27 2009 - 12:58:32 EST





On Mon, 27 Apr 2009, Christoph Lameter wrote:

On Mon, 27 Apr 2009, Pekka Enberg wrote:

18: 4a 8b 8c eb 68 01 00 mov 0x168(%rbx,%r13,8),%rcx # l3 =
cachep->nodelists[node];
1f: 00
20: 48 8b 16 mov (%rsi),%rdx
23: 48 8b 46 08 mov 0x8(%rsi),%rax
27: 48 89 42 08 mov %rax,0x8(%rdx)
2b:* 48 89 10 mov %rdx,(%rax) <-- trapping instruction
2e: 89 e8 mov %ebp,%eax
30: 48 c7 06 00 01 10 00 movq $0x100100,(%rsi)
37: 48 c7 46 08 00 02 20 movq $0x200200,0x8(%rsi)

it seems like list_del() in free_block() explodes because because
->prev ("rax") of slab->list is bogus ("0000000000000cd0").

Where do I find the rest of the information regarding this report?
bugzilla does only contain a pointer to the initial report on lkml no
discussion.

Typically these oopses occur because the slab header at the beginning of a
slab is overwritten. Enable debugging. Switching to SLUB would give better
diagnostics.

After turning the suggested debuging options I've got tons of these when trying to stress the tape device like before:

Apr 27 16:57:30 fs kernel: [ 96.446708] slab error in verify_redzone_free(): cache `size-128': memory outside object was overwritten
Apr 27 16:57:30 fs kernel: [ 96.446713] Pid: 0, comm: swapper Not tainted 2.6.29.1-64 #2
Apr 27 16:57:30 fs kernel: [ 96.446715] Call Trace:
Apr 27 16:57:30 fs kernel: [ 96.446717] <IRQ> [<ffffffff8029adc5>] __slab_error+0x1f/0x25
Apr 27 16:57:30 fs kernel: [ 96.446728] [<ffffffff8029b24b>] cache_free_debugcheck+0x108/0x1d6
Apr 27 16:57:30 fs kernel: [ 96.446731] [<ffffffff8029b473>] kfree+0x81/0xc2
Apr 27 16:57:30 fs kernel: [ 96.446735] [<ffffffff802bd311>] bio_free_map_data+0xc/0x1e
Apr 27 16:57:30 fs kernel: [ 96.446738] [<ffffffff802bdc6d>] bio_uncopy_user+0x38/0x48
Apr 27 16:57:30 fs kernel: [ 96.446742] [<ffffffff803670e6>] blk_rq_unmap_user+0x1e/0x45
Apr 27 16:57:30 fs kernel: [ 96.446747] [<ffffffff8046ed7f>] st_scsi_execute_end+0x4e/0x5e
Apr 27 16:57:30 fs kernel: [ 96.446751] [<ffffffff8036425f>] blk_end_io+0x55/0x76
Apr 27 16:57:30 fs kernel: [ 96.446754] [<ffffffff804a17ad>] mpt_interrupt+0x422/0x53f
Apr 27 16:57:30 fs kernel: [ 96.446758] [<ffffffff8044be0b>] scsi_io_completion+0x18f/0x415
Apr 27 16:57:30 fs kernel: [ 96.446762] [<ffffffff80368160>] blk_done_softirq+0x62/0x72
Apr 27 16:57:30 fs kernel: [ 96.446766] [<ffffffff802523d0>] __do_softirq+0x7f/0x138
Apr 27 16:57:30 fs kernel: [ 96.446770] [<ffffffff80238d70>] ack_apic_level+0x46/0xce
Apr 27 16:57:30 fs kernel: [ 96.446774] [<ffffffff80225b3c>] call_softirq+0x1c/0x28
Apr 27 16:57:30 fs kernel: [ 96.446777] [<ffffffff8022706c>] do_softirq+0x2c/0x6c
Apr 27 16:57:30 fs kernel: [ 96.446780] [<ffffffff802272b1>] do_IRQ+0xb6/0xd5
Apr 27 16:57:30 fs kernel: [ 96.446784] [<ffffffff80225413>] ret_from_intr+0x0/0xa
Apr 27 16:57:30 fs kernel: [ 96.446785] <EOI> [<ffffffff80564e7a>] udp_poll+0x0/0x10e
Apr 27 16:57:30 fs kernel: [ 96.446793] [<ffffffff8022b26c>] mwait_idle+0x63/0x66
Apr 27 16:57:30 fs kernel: [ 96.446795] [<ffffffff802238d6>] cpu_idle+0x40/0x5e
Apr 27 16:57:30 fs kernel: [ 96.446798] ffff88013c197b48: redzone 1:0xd84156c5635688c0, redzone 2:0xffffe20004209348.

Can I help by testing an rc version if this happens too ?

--
Regards,
Bart mmx@xxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/