Re: Free memory never fully used, swapping

From: Simon Kirby
Date: Wed Nov 24 2010 - 14:17:57 EST


On Wed, Nov 24, 2010 at 09:27:53AM +0000, Mel Gorman wrote:

> On Tue, Nov 23, 2010 at 10:43:29PM -0800, Simon Kirby wrote:
> > On Tue, Nov 23, 2010 at 10:04:03AM +0000, Mel Gorman wrote:
> >
> > > On Mon, Nov 22, 2010 at 03:44:19PM -0800, Andrew Morton wrote:
> > > > On Mon, 15 Nov 2010 11:52:46 -0800
> > > > Simon Kirby <sim@xxxxxxxxxx> wrote:
> > > >
> > > > > I noticed that CONFIG_NUMA seems to enable some more complicated
> > > > > reclaiming bits and figured it might help since most stock kernels seem
> > > > > to ship with it now. This seems to have helped, but it may just be
> > > > > wishful thinking. We still see this happening, though maybe to a lesser
> > > > > degree. (The following observations are with CONFIG_NUMA enabled.)
> > > > >
> > >
> > > Hi,
> > >
> > > As this is a NUMA machine, what is the value of
> > > /proc/sys/vm/zone_reclaim_mode ? When enabled, this reclaims memory
> > > local to the node in preference to using remote nodes. For certain
> > > workloads this performs better but for users that expect all of memory
> > > to be used, it has surprising results.
> > >
> > > If set to 1, try testing with it set to 0 and see if it makes a
> > > difference. Thanks
> >
> > Hi Mel,
> >
> > It is set to 0. It's an Intel EM64T...I only enabled CONFIG_NUMA since
> > it seemed to enable some more complicated handling, and I figured it
> > might help, but it didn't seem to. It's also required for
> > CONFIG_COMPACTION, but that is still marked experimental.
> >
>
> I'm surprised a little that you are bringing compaction up because unless
> there are high-order involved, it wouldn't make a difference. Is there
> a constant source of high-order allocations in the system e.g. a network
> card configured to use jumbo frames? A possible consequence of that is that
> reclaim is kicking in early to free order-[2-4] pages that would prevent 100%
> of memory being used.

We /were/ using jumbo frames, but only over a local cross-over connection
to another node (for DRBD), so I disabled jumbo frames on this interface
and reconnected DRBD. Even with MTUs set to 1500, we saw GFP_ATOMIC
order=3 allocations coming from __alloc_skb:

perf record --event kmem:mm_page_alloc --filter 'order>=3' -a --call-graph sleep 10
perf trace

imap-20599 [002] 1287672.803567: mm_page_alloc: page=0xffffea00004536c0 pfn=4536000 order=3 migratetype=0 gfp_flags=GFP_ATOMIC|GFP_NOWARN|GFP_NORETRY|GFP_COMP

perf report shows:

__alloc_pages_nodemask
alloc_pages_current
new_slab
__slab_alloc
__kmalloc_node_track_caller
__alloc_skb
__netdev_alloc_skb
bnx2_poll_work

Dave was seeing these on his laptop with an Intel NIC as well. Ralf
noted that the slab cache grows in higher order blocks, so this is
normal. The GFP_ATOMIC bubbles up from *alloc_skb, I guess.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/