Re: Default zone_reclaim_mode = 1 on NUMA kernel is badforfile/email/web servers

From: Bron Gondwana
Date: Tue Sep 28 2010 - 08:51:37 EST


On Tue, 28 Sep 2010 07:35 -0500, "Christoph Lameter" <cl@xxxxxxxxx> wrote:
> > The problem we saw was purely with file caching. The application wasn't
> > actually allocating much memory itself, but it was reading lots of files
> > from disk (via mmap'ed memory mostly), and as most people would, we
> > expected that data would be cached in memory to reduce future reads from
> > disk. That was not happening.
>
> Obviously and you have stated that numerous times. Problem that the use
> of
> a remote memory will reduced performance of reads so the OS (with
> zone_reclaim=1) defaults to the use of local memory and favors reclaim of
> local memory over the allocation from the remote node. This is fine if
> you have multiple applications running on both nodes because then each
> application will get memory local to it and therefore run faster. That
> does not work with a single app that only allocates from one node.

Is this what's happening, or is IO actually coming from disk in preference
to the remote node? I can certainly see the logic behind preferring to
reclaim the local node if that's all that's happening - though the OS should
be allocating the different tasks more evenly across the nodes in that case.

> Control over memory allocations over the various nodes under NUMA
> for a process can occur via the numactl ctl or the libnuma C apis.
>
> F.e.e
>
> numactl --interleave ... command
>
> will address that issue for a specific command that needs to go

Gosh what a pain. While it won't kill us too much to add to our
startup, it does feel a lot like the tail is wagging the dog from here
still. A task that doesn't ask for anything special should get sane
defaults, and the cost of data from the other node should be a lot
less than the cost of the same data from spinning rust.

Bron.
--
Bron Gondwana
brong@xxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/