Re: [PATCH v2 1/3] mm: Trigger bug on if a section is not found in __section_nr

From: Michal Hocko
Date: Thu Jun 27 2019 - 04:10:33 EST


On Thu 27-06-19 10:50:57, Alastair D'Silva wrote:
> On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote:
> > On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
> > > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> > > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > > > > From: Alastair D'Silva <alastair@xxxxxxxxxxx>
> > > > >
> > > > > If a memory section comes in where the physical address is
> > > > > greater
> > > > > than
> > > > > that which is managed by the kernel, this function would not
> > > > > trigger the
> > > > > bug and instead return a bogus section number.
> > > > >
> > > > > This patch tracks whether the section was actually found, and
> > > > > triggers the
> > > > > bug if not.
> > > >
> > > > Why do we want/need that? In other words the changelog should
> > > > contina
> > > > WHY and WHAT. This one contains only the later one.
> > > >
> > >
> > > Thanks, I'll update the comment.
> > >
> > > During driver development, I tried adding peristent memory at a
> > > memory
> > > address that exceeded the maximum permissable address for the
> > > platform.
> > >
> > > This caused __section_nr to silently return bogus section numbers,
> > > rather than complaining.
> >
> > OK, I see, but is an additional code worth it for the non-development
> > case? I mean why should we be testing for something that shouldn't
> > happen normally? Is it too easy to get things wrong or what is the
> > underlying reason to change it now?
> >
>
> It took me a while to identify what the problem was - having the BUG_ON
> would have saved me a few hours.
>
> I'm happy to just have the BUG_ON 'nd drop the new error return (I
> added that in response to Mike Rapoport's comment that the original
> patch would still return a bogus section number).

Well, BUG_ON is about the worst way to handle an incorrect input. You
really do not want to put a production environment down just because
there is a bug in a driver, right? There are still many {VM_}BUG_ONs
in the tree and there is a general trend to get rid of many of them
rather than adding new ones.

Now back to your patch. You are adding an error handling where we simply
do not expect invalid data. This is often the case for the core kernel
functionality because we do expect consumers of the code to do the right
thing. E.g. __section_nr already takes a pointer to struct section which
assumes that this core data structure is already valid. Adding a check
here adds an unnecessary overhead so this doesn't really sound like a
good idea to me.
--
Michal Hocko
SUSE Labs