Re: OOM detection regressions since 4.7

From: Michal Hocko
Date: Mon Aug 29 2016 - 11:07:11 EST


On Mon 29-08-16 16:52:03, Olaf Hering wrote:
> On Thu, Aug 25, Olaf Hering wrote:
>
> > On Thu, Aug 25, Michal Hocko wrote:
> >
> > > Any luck with the testing of this patch?
>
> I ran rc3 for a few hours on Friday amd FireFox was not killed.
> Now rc3 is running for a day with the usual workload and FireFox is
> still running.

Is the patch
(http://lkml.kernel.org/r/20160823074339.GB23577@xxxxxxxxxxxxxx) applied?

> Today I noticed the nfsserver was disabled, probably since months already.
> Starting it gives a OOM, not sure if this is new with 4.7+.
> Full dmesg attached.
> [93348.306369] modprobe: page allocation failure: order:4, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)

ok so order-4 (COSTLY allocation) has failed because

[...]
> [93348.313778] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
> [93348.313803] Node 0 DMA32: 13633*4kB (UME) 8035*8kB (UME) 890*16kB (UME) 10*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 133372kB
> [93348.313822] Node 0 Normal: 14003*4kB (UME) 25*8kB (UME) 2*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56244kB

the memory is too fragmented for such a large allocation. Failing
order-4 requests is not so severe because we do not invoke the oom
killer if they fail. Especially without GFP_REPEAT we do not even try
too hard. Recent oom detection changes shouldn't change this behavior.

--
Michal Hocko
SUSE Labs