Re: An oops while running 2.5.65-mm2

From: Martin Josefsson (gandalf@wlug.westbo.se)
Date: Thu Mar 20 2003 - 20:19:14 EST


On Thu, 2003-03-20 at 21:29, Andrew Morton wrote:
> jjs <jjs@tmsusa.com> wrote:
> >
> > Greetings -
> >
> > Here is some info about an oops from 2.5.65-mm2
>
> It is not exactly an oops, but it is a warning of a fatal bug.

This might explain the crashes I've seen in my routers but been unable
to capture so far (and without the extra slab debugging I don't know if
I would have been able to find it or even get a decent capture...)

Patch attached that should fix the problem, not tested or even compiled.

Joe, can you please please test it and report back?

> Look at this lovely trace:

> Mar 20 11:06:46 jyro kernel: Slab corruption: start=ceaa2234, expend=ceaa2377, problemat=ceaa22ac
> Mar 20 11:06:46 jyro kernel: Last user: [<e0a12cb0>](destroy_conntrack+0xd0/0x140 [ip_conntrack])
> Mar 20 11:06:46 jyro kernel: Data: ************************************************************************************************************************AC 22 AA CE AC 22 AA CE ***************************************************************************************************************************************************************************************************A5
> Mar 20 11:06:46 jyro kernel: Next: 71 F0 2C .B0 2C A1 E0 71 F0 2C .********************
> Mar 20 11:06:46 jyro kernel: slab error in check_poison_obj(): cache `ip_conntrack': object was modified after freeing

> Looking at the data pattern, it is probably an INIT_LIST_HEAD() against a
> list_head field which is 120 bytes into the object. (problemat - start). Or
> a list_del() against a different object which erroneously remains on a list
> with this object.

You are correct. It was a list_del() that caused it (at least I think
so, it's 2am right now).

1. conntrack helper adds an expectation and adds that to a list hanging
of off a connection.

2. the expected connection arrives. the expectation is still on the
list.

3. the original connection that caused the expectation terminates but
the expectation still thinks it's added to the list.

4. the expected connection terminates and list_del() is called to remove
it from the list which doesn't exist anymore. boom!

> Manfred has extra toys in the works which will be able to unmap slab objects
> from the kernel virtual address space when they are freed. When this debug
> code is working (it will run slowly) we will get an oops at the site of the
> bug.

This will be a nice feature, might make it easier to find the bugs.
Too bad it can't help work out the relationship between structs... I had
to use pen and paper to work out the relationship between connections
and expectations :) That part is almost a little bit hairy.

-- 
/Martin

Never argue with an idiot. They drag you down to their level, then beat you with experience.

--- linux-2.5.64-bk10/net/ipv4/netfilter/ip_conntrack_core.c.orig 2003-03-21 01:42:57.000000000 +0100 +++ linux-2.5.64-bk10/net/ipv4/netfilter/ip_conntrack_core.c 2003-03-21 01:44:11.000000000 +0100 @@ -274,6 +274,7 @@ * the un-established ones only */ if (exp->sibling) { DEBUGP("remove_expectations: skipping established %p of %p\n", exp->sibling, ct); + exp->sibling = NULL; continue; } @@ -327,9 +328,11 @@ WRITE_LOCK(&ip_conntrack_lock); /* Delete our master expectation */ if (ct->master) { - /* can't call __unexpect_related here, - * since it would screw up expect_list */ - list_del(&ct->master->expected_list); + if (ct->master->sibling) { + /* can't call __unexpect_related here, + * since it would screw up expect_list */ + list_del(&ct->master->expected_list); + } kfree(ct->master); } WRITE_UNLOCK(&ip_conntrack_lock);

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:33 EST