Re: [PATCH] net/skbuff: silence warnings under memory pressure

From: Michal Hocko
Date: Wed Sep 04 2019 - 08:07:10 EST


On Wed 04-09-19 07:59:17, Qian Cai wrote:
> On Wed, 2019-09-04 at 10:25 +0200, Michal Hocko wrote:
> > On Wed 04-09-19 16:00:42, Sergey Senozhatsky wrote:
> > > On (09/04/19 15:41), Sergey Senozhatsky wrote:
> > > > But the thing is different in case of dump_stack() + show_mem() +
> > > > some other output. Because now we ratelimit not a single printk() line,
> > > > but hundreds of them. The ratelimit becomes - 10 * $$$ lines in 5 seconds
> > > > (IOW, now we talk about thousands of lines).
> > >
> > > And on devices with slow serial consoles this can be somewhat close to
> > > "no ratelimit". *Suppose* that warn_alloc() adds 700 lines each time.
> > > Within 5 seconds we can call warn_alloc() 10 times, which will add 7000
> > > lines to the logbuf. If printk() can evict only 6000 lines in 5 seconds
> > > then we have a growing number of pending logbuf messages.
> >
> > Yes, ratelimit is problematic when the ratelimited operation is slow. I
> > guess that is a well known problem and we would need to rework both the
> > api and the implementation to make it work in those cases as well.
> > Essentially we need to make the ratelimit act as a gatekeeper to an
> > operation section - something like a critical section except you can
> > tolerate more code executions but not too many. So effectively
> >
> > start_throttle(rate, number);
> > /* here goes your operation */
> > end_throttle();
> >
> > one operation is not considered done until the whole section ends.
> > Or something along those lines.
> >
> > In this particular case we can increase the rate limit parameters of
> > course but I think that longterm we need a better api.
>
> The problem is when a system is under heavy memory pressure, everything is
> becoming slower, so I don't know how to come up with a sane default for rate
> limit parameters as a generic solution that would work for every machine out
> there. Sure, it is possible to set a limit as low as possible that would work
> for the majority of systems apart from people may complain that they are now
> missing important warnings, but using __GFP_NOWARN in this code would work for
> all systems. You could even argument there is even a separate benefit that it
> could reduce the noise-level overall from those build_skb() allocation failures
> as it has a fall-back mechanism anyway.

As Vlastimil already pointed out, __GFP_NOWARN would hide that reserves
might be configured too low.
--
Michal Hocko
SUSE Labs