Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized

From: Ingo Molnar
Date: Sun Sep 24 2017 - 05:16:47 EST



* Jean Delvare <jdelvare@xxxxxxx> wrote:

> Hi Ingo,
>
> On Sat, 23 Sep 2017 12:50:31 +0200, Ingo Molnar wrote:
> > * Jean Delvare <jdelvare@xxxxxxx> wrote:
> >
> > > I don't think it makes sense to check for a possible bad
> > > initialization order at run time on every system when it is all
> > > decided at build time.
> > >
> > > A more efficient way to make sure developers do not introduce new
> > > calls to dmi_check_system() too early in the initialization sequence
> > > is to simply document the expected call order. That way, developers
> > > have a chance to get it right immediately, without having to
> > > test-boot their kernel, wonder why it does not work, and parse the
> > > kernel logs for a warning message. And we get rid of the run-time
> > > performance penalty as a nice side effect.
> >
> > Huh? Initialization ordering requirements are very opaque,
>
> They were. Now they are very documented.
>
> > and by removing the debug check any such bugs are actively hidden. How
> > is documentation supposed to uncover such bugs once they happen?
>
> You are looking at it the wrong way around. Documentation is how they
> do not happen in the first place.

That expectation, as a general statement, is very naive and contrary to
experience: documentation is fine for one layer of defense to prevent bugs, but
_when_ they happen and a bug slips through, documentation does not help anymore,
because the dependencies in the _code_ are opaque and non-obvious ...

For example during the early SMP efforts of Linux we used to document lock
dependencies as well, but once the kernel had more than a dozen spinlocks we
periodically ran into deadlocks and the whole design became unmaintainable
quickly. So we have lockdep in addition to documentation.

> You hit this problem once, 9 years ago. You thought it would have been easier to
> debug if there was a warning, and you added it.

I did not just 'think' it would have been easier to debug, I wasted time on that
bug and a warning would have helped so I added it. That was and remains
objectively true.

While I expect most such warnings to never see any public email lists (because
once a developer triggers it it gets fixed without the bug ever getting triggered
by others), yet searching for "dmi check: not initialized yet" still finds a
couple of incidents where real or potential bugs were found by this init
dependency check, such as:

http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/289347.html

or this:

https://www.spinics.net/lists/linux-acpi/msg28698.html

... so this warning actually helped a number of kernel developers to not
waste time on the opaque dependency. This is a warning that was added due
to an _actual category of bugs_, which has been triggered subsequently as
well, so it's not a frivolous warning by any meaning.

> [...] It was one way to solve the problem but I claim it was not the best.
>
> What I expect from developers calling a function they aren't familiar
> with is to read its documentation first. That's the very reason why we
> spend time writing the documentation. They should not just call the
> function, boot and see if it works or not. Software engineering vs.
> trial and error.

This statement is breathtaking in its ignorance :-(

> > So NAK.
>
> This was FYI. I maintain this subsystem, and you did not convince me. I also
> can't see a general trend of implementing what you suggest in the rest of the
> kernel. Thankfully.

I find the arrogance displayed here breathtaking as well - the last thing we need
is for firmware interfacing kernel code to become _more_ fragile.

This was and continues to be a useful warning - but what worries me even more is
not just the removal of the warning, but the false and technically invalid
justifications under which it is removed...

For those reasons I maintain my NAK:

Nacked-by: Ingo Molnar <mingo@xxxxxxxxxx>

Thanks,

Ingo