Re: [PATCH 3.3] memcg: free mem_cgroup by RCU to fix oops

From: Hugh Dickins
Date: Mon Mar 12 2012 - 14:44:14 EST


On Mon, 12 Mar 2012, Stanislaw Gruszka wrote:
> On Fri, Mar 09, 2012 at 11:58:34AM -0800, Hugh Dickins wrote:
> > On Wed, 7 Mar 2012, Hugh Dickins wrote:
> > >
> > > I'm posting this a little prematurely to get eyes on it, since it's
> > > more than a two-liner, but 3.3 time is running out. If it is what's
> > > needed to fix my oopses, I won't really be sure before Friday morning.
> > > What's running now on the machine affected is using kfree_rcu(), but I
> > > did hack it earlier to check that the vfree_rcu() alternative works.
> >
> > Yes, please do send that patch on to Linus for 3.3.
> >
> > It did not get as much as the 36 hours of testing I had hoped for, only
> > 25 hours so far. 12 hours while I was out yesterday got wasted by a
> > wireless driver interrupt spewing approximately one million messages:
> >
> > iwl3945 0000:08:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
>
> I replaced that with WARN_ONCE
> http://marc.info/?l=linux-wireless&m=132912863701997&w=2
> (the patch is currently in net-next).

That's very welcome, thank you. One message will suit me better than
a million. But I didn't mention that for every seven of those, there
was one slightly different message coming too:

iwl3945 0000:08:00.0: UNKNOWN (0xFFFFFFFF) 4294967295 ... (I got bored)

>
> > which I've not suffered from before, and hope not again. Having kdb
> > in, I did take a look what was going on with the memcg load when it was
> > interrupted: it appeared to be normal, and I've no reason to suppose that
> > my kfree_rcu() was in any way responsible for the wireless aberration.

I didn't see it again. I rebooted and ran the test for 63 hours and
it went fine this time with no interference from the wifi. I did have
the laptop differently positioned this time: maybe it was overheating
up against the wall, though that position gave no trouble in the past.

>
> I don't know if is possible if test patch influence pci or mac80211 code
> (we use rcu quite intensively in mac8021).

It's very very unlikely that the patch I was testing had a significant
effect on RCU usage: the test ends up doing just one extra call_rcu every
minute. There's a heavy swapping load running alongside, which shouldn't
be disturbing PCI config at all; but it's possible that a temporarily
failing memory allocation, or overheat, drove iwl3945 down a strange
path on this one occasion, and once there it couldn't recover.

Hugh

> Those "MAC is in deep sleep"
> usually mean that wireless device registers can not be read through pcie
> bus - i.e. when pcie bridge is erroneously disabled like in report here:
> http://marc.info/?l=linux-wireless&m=132577331132329&w=2
>
> Note that wireless device is one of a few connected through pcie bridge
> on most of the laptops, others external pcie devices like mmc are not
> used frequently, hence breakage in pci code looks frequently like
> breakage in wireless driver. Not sure if that was the case here though.
>
> Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/