Re: [PATCH v1] kernel/trace:check the val against the available mem

From: Matthew Wilcox
Date: Fri Mar 30 2018 - 22:19:05 EST


On Fri, Mar 30, 2018 at 09:41:51PM -0400, Steven Rostedt wrote:
> On Fri, 30 Mar 2018 16:38:52 -0700
> Joel Fernandes <joelaf@xxxxxxxxxx> wrote:
>
> > > --- a/kernel/trace/ring_buffer.c
> > > +++ b/kernel/trace/ring_buffer.c
> > > @@ -1164,6 +1164,11 @@ static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu)
> > > struct buffer_page *bpage, *tmp;
> > > long i;
> > >
> > > + /* Check if the available memory is there first */
> > > + i = si_mem_available();
> > > + if (i < nr_pages)
> >
> > Does it make sense to add a small margin here so that after ftrace
> > finishes allocating, we still have some memory left for the system?
> > But then then we have to define a magic number :-|
>
> I don't think so. The memory is allocated by user defined numbers. They
> can do "free" to see what is available. The original patch from
> Zhaoyang was due to a script that would just try a very large number
> and cause issues.
>
> If the memory is available, I just say let them have it. This is
> borderline user space issue and not a kernel one.

Again though, this is the same pattern as vmalloc. There are any number
of places where userspace can cause an arbitrarily large vmalloc to be
attempted (grep for kvmalloc_array for a list of promising candidates).
I'm pretty sure that just changing your GFP flags to GFP_KERNEL |
__GFP_NOWARN will give you the exact behaviour that you want with no
need to grub around in the VM to find out if your huge allocation is
likely to succeed.