Re: [PATCH v1] mm/gup: remove (VM_)BUG_ONs
From: Lorenzo Stoakes
Date: Fri Jun 06 2025 - 14:07:31 EST
On Fri, Jun 06, 2025 at 10:57:44AM -0700, John Hubbard wrote:
> On 6/6/25 4:04 AM, Lorenzo Stoakes wrote:
> > On Fri, Jun 06, 2025 at 12:28:28PM +0200, David Hildenbrand wrote:
> >> On 06.06.25 12:19, Lorenzo Stoakes wrote:
> >>> On Fri, Jun 06, 2025 at 12:13:27PM +0200, Michal Hocko wrote:
> >>>> On Fri 06-06-25 11:01:18, David Hildenbrand wrote:
> >>>>> On 06.06.25 10:31, Michal Hocko wrote:
> >>>> [...]
> > So to me the only assessment needed is 'do we want to warn on this or not?'.
> >
> > And as you say, really WARN_ON_ONCE() seems appropriate, because nearly always
> > we will get flooded with useless information.
> >
>
> As yet another victim of such WARN_ON() floods at times, I've followed
> this thread with great interest. And after reflecting on it a bit, I believe
> that, surprisingly enough, WARN_ON() is a better replacement for VM_BUG_ON()
> than WARN_ON_ONCE(), because:
Right, these shouldn't be happening _at all_.
I'm easy on this point, I'd say in that case VM_WARN_ON() is the most
_conservative_ approach, since these are things that must not happen, and
so it's not unreasonable to fail to repress repetitions of the 'impossible'
:)
But I get the general point about ...WARN_ON_ONCE() avoiding floods.
David, what do you think?
>
> * The only time you'll be flooded with WARN_ON() messages is when *two*
> things happen at once:
>
> a) Something that used to completely crash the machine (a VM_BUG_ON
> condition) happens, and
>
> b) You're in a loop and it keeps on happening. Yes, in -mm, that does
> happen a lot (per-page loops, for example), but still.
>
> * It's *so* easy to miss a WARN_ON_ONCE(). We don't want that, not for a
> critical failure case that used to be a VM_BUG_ON().
However, I do dispute this point - warnings are pretty easy to pick up from
my point of view unless your dmesg is absolutely rammed, and if you're
concerned you can panic_on_warn right?
I treat any warning that I see for instance in a test run on qemu as a
'must fix' problem, let alone if observed on an actual hardware system.
Are you thinking of scenarios for instance where you have a lot of debug
output in dmesg and thus these fly by, and when you retry the operation it
won't show again and thus missed that way?
But of course this won't do much to help you should the operation be one
you happen to only perform once however! :)
>
>
> thanks,
> --
> John Hubbard
>
Cheers, Lorenzo