Re: VM/networking crash cause #1: page allocation failure (order:1,GFP_ATOMIC)

From: Robert Hancock
Date: Tue Nov 06 2007 - 18:14:54 EST


Frank van Maarseveen wrote:
For quite some time I'm seeing occasional lockups spread over 50 different
machines I'm maintaining. Symptom: a page allocation failure with order:1,
GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
pages, almost no swap used) followed by a lockup (everything dead). I've
collected all (12) crash cases which occurred the last 10 weeks on 50
machines total (i.e. 1 crash every 41 weeks on average). The kernel
messages are summarized to show the interesting part (IMO) they have
in common. Over the years this has become the crash cause #1 for stable
kernels for me (fglrx doesn't count ;).

One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
network driver via that same driver (netconsole) may not be the smartest
thing to do and this could be responsible for the lockup itself. However,
the initial page allocation failure remains and I'm not sure how to
address that problem.

I still think the issue is memory fragmentation but if so, it looks
a bit extreme to me: One system with 2GB of ram crashed after a day,
merely running a couple of TCP server programs. All systems have either
1 or 2GB ram and at least 1G of (merely unused) swap.

These are all order-1 allocations for received network packets that need to be allocated out of low memory (assuming you're using a 32-bit kernel), so it's quite possible for them to fail on occasion. (Are you using jumbo frames?)

That should not be causing a lockup though.. the received packet should just get dropped.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@xxxxxxxxxxxxx
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/