Re: [RFC] vmalloc: add warning in __vmalloc

From: David Rientjes
Date: Tue May 01 2012 - 16:23:00 EST


On Tue, 1 May 2012, Nick Piggin wrote:

> > I disagree with this approach since it's going to violently spam an
> > innocent kernel user's log with no ratelimiting and for a situation that
> > actually may not be problematic.
>
> With WARN_ON_ONCE, it should be good.
>

To catch a single instance of this per-boot, sure. I've never seen us add
WARN_ON_ONCE()'s where we have concrete examples of kernel code that will
trigger it, though. Not sure why spamming the kernel log and getting
users to think something is wrong and report the bug when it's possible to
audit the code and make a report to the subsystem maintainer. Perhaps
adding WARN_ON_ONCE()'s is just easier and then walk away from it?

> > Passing any of these bits (the difference between GFP_KERNEL and
> > GFP_ATOMIC) only means anything when we're going to do reclaim. ÂAnd I'm
> > suspecting we would have seen problems with this already since
> > pte_alloc_kernel() does __GFP_REPEAT on most architectures meaning that it
> > will loop infinitely in the page allocator until at least one page is
> > freed (since its an order-0 allocation) which would hardly ever happen if
> > __GFP_FS or __GFP_IO actually meant something in this context.
> >
> > In other words, we would already have seen these deadlocks and it would
> > have been diagnosed as a vmalloc(GFP_ATOMIC) problem. ÂWhere are those bug
> > reports?
>
> That's not sound logic to disprove a bug.
>
> I think simply most callers are permissive and don't mask out flags.
> But for example a filesystem holding an fs lock and then doing
> vmalloc(GFP_NOFS) can certainly deadlock.
>

I'm not disproving a bug, I'm asking for an example of how this problem
has caused pain before and it has been the result of calling
vmalloc(GFP_NOFS). I agree we should certainly fix those callers, but it
seems like adding the WARN_ON_ONCE()'s is certainly going to cause pain in
tons of bug reports where there's no actual problem that couldn't have
been found by auditing the code.