Re: [PATCH] net/skbuff: silence warnings under memory pressure

From: Qian Cai
Date: Tue Sep 03 2019 - 17:42:33 EST


On Tue, 2019-09-03 at 20:53 +0200, Michal Hocko wrote:
> On Tue 03-09-19 11:42:22, Qian Cai wrote:
> > On Tue, 2019-09-03 at 15:22 +0200, Michal Hocko wrote:
> > > On Fri 30-08-19 18:15:22, Eric Dumazet wrote:
> > > > If there is a risk of flooding the syslog, we should fix this
> > > > generically
> > > > in mm layer, not adding hundred of __GFP_NOWARN all over the places.
> > >
> > > We do already ratelimit in warn_alloc. If it isn't sufficient then we
> > > can think of a different parameters. Or maybe it is the ratelimiting
> > > which doesn't work here. Hard to tell and something to explore.
> >
> > The time-based ratelimit won't work for skb_build() as when a system under
> > memory pressure, and the CPU is fast and IO is so slow, it could take a long
> > time to swap and trigger OOM.
>
> I really do not understand what does OOM and swapping have to do with
> the ratelimiting here. The sole purpose of the ratelimit is to reduce
> the amount of warnings to be printed. Slow IO might have an effect on
> when the OOM killer is invoked but atomic allocations are not directly
> dependent on IO.

When there is a heavy memory pressure, the system is trying hard to reclaim
memory to fill up the watermark. However, the IO is slow to page out, but the
memory pressure keep draining atomic reservoir, and some of those skb_build()
will fail eventually.

Only if there is a fast IO, it will finish swapping sooner and then invoke the
OOM to end the memory pressure.

>
> > I suppose what happens is those skb_build() allocations are from softirq,
> > and
> > once one of them failed, it calls printk() which generates more interrupts.
> > Hence, the infinite loop.
>
> Please elaborate more.
>

If you look at the original report, the failed allocation dump_stack() is,

Â<IRQ>
Âwarn_alloc.cold.43+0x8a/0x148
Â__alloc_pages_nodemask+0x1a5c/0x1bb0
Âalloc_pages_current+0x9c/0x110
Âallocate_slab+0x34a/0x11f0
Ânew_slab+0x46/0x70
Â___slab_alloc+0x604/0x950
Â__slab_alloc+0x12/0x20
Âkmem_cache_alloc+0x32a/0x400
Â__build_skb+0x23/0x60
Âbuild_skb+0x1a/0xb0
Âigb_clean_rx_irq+0xafc/0x1010 [igb]
Âigb_poll+0x4bb/0xe30 [igb]
Ânet_rx_action+0x244/0x7a0
Â__do_softirq+0x1a0/0x60a
Âirq_exit+0xb5/0xd0
Âdo_IRQ+0x81/0x170
Âcommon_interrupt+0xf/0xf
Â</IRQ>

Since it has no __GFP_NOWARN to begin with, it will call,

printk
vprintk_default
vprintk_emit
wake_up_klogd
irq_work_queue
__irq_work_queue_local
arch_irq_work_raise
apic->send_IPI_self(IRQ_WORK_VECTOR)
smp_irq_work_interrupt
exiting_irq
irq_exit

and end up processing pendingÂnet_rx_action softirqs again which are plenty due
to connected via ssh etc.