Re: [PATCH v10 08/14] mm: multi-gen LRU: support page table walks

From: Andrew Morton
Date: Fri Apr 15 2022 - 17:32:29 EST


On Fri, 15 Apr 2022 14:11:32 -0600 Yu Zhao <yuzhao@xxxxxxxxxx> wrote:

> >
> > I grabbed
> > https://kojipkgs.fedoraproject.org//packages/kernel/5.18.0/0.rc2.23.fc37/src/kernel-5.18.0-0.rc2.23.fc37.src.rpm
> > and
>
> Yes, Fedora/RHEL is one concrete example of the model I mentioned
> above (experimental/stable). I added Justin, the Fedora kernel
> maintainer, and he can further clarify.
>
> If we don't want more VM_BUG_ONs, I'll remove them. But (let me
> reiterate) it seems to me that just defeats the purpose of having
> CONFIG_DEBUG_VM.
>

Well, I feel your pain. It was never expected that VM_BUG_ON() would
get subverted in this fashion.

We could create a new MM-developer-only assertion. Might even call it
MM_BUG_ON(). With compile-time enablement but perhaps not a runtime
switch.

With nice simple semantics, please. Like "it returns void" and "if you
pass an expression with side-effects then you lose". And "if you send
a patch which produces warnings when CONFIG_MM_BUG_ON=n then you get to
switch to windows95 for a month".

Let's leave the mglru assertions in place for now and let's think about
creating something more suitable, with a view to switching mglru over
to that at a later time.



But really, none of this addresses the core problem: *_BUG_ON() often
kills the kernel. So guess what we just did? We killed the user's
kernel at the exact time when we least wished to do so: when they have
a bug to report to us. So the thing is self-defeating.

It's much much better to WARN and to attempt to continue. This makes
it much more likely that we'll get to hear about the kernel flaw.