Re: [PATCH 2/2] module: fix bne2 "gave up waiting for init of module libcrc32c"

From: Rusty Russell
Date: Tue Jun 01 2010 - 23:09:49 EST


On Wed, 2 Jun 2010 12:28:31 am Linus Torvalds wrote:
>
> On Tue, 1 Jun 2010, Rusty Russell wrote:
> >
> > > So explain why I should be more polite, or more terse?
> >
> > How about this:
> >
> > "I hate this patch. It makes already-ugly locking in module.c awful, and I
> > can't see that it's correct. Why not just reduce the locking to cover
> > the minimum needed?"
>
> Umm. Did you read the first few emails I sent out on this thread
> originally?

OK, this might be worthwhile. Take the very first mail you sent:

Hmm. That does seem to be buggy. We can't just drop and re-take the lock:
that may make sense internally as far as resolve_symbol() itself is
concerned, but the caller will its own local variables, and some of those
will no longer be valid if the lock was dropped.

The implication here is that that I don't know locking, and that this was
done without thought or care. In fact, I did worry about it and decided it
was less risky than a locking reduction given my limited cycles (indeed,
it has been slightly).

That commit also changes the return value semantics of "use_module()",
which is an exported interface where the only users seem to be
out-of-kernel (the only in-kernel use is in kernel/module.c itself). That
seems like a really really bad idea too.

Again with the implication that this was done without consideration. I keep
that exported as a courtesy to the ksplice team, but I never intended to let
that shackle internal changes even slightly. And I think it's wrong to even
start down that path (though I changed the name for you in the recent patches)

So I think reverting it is definitely the right thing to do. The commit
seems fundamentally broken. And having modules do request_module() in
their init functions has always been invalid anyway, so that excuse
doesn't really seem to be a reason to do anything crazy like this either.

The code is "fundamentally broken" and "crazy". The purpose of the patch
was "invalid anyway".

You managed to accuse me four times of being an incompetent and downright
crazy maintainer in your first three paragraphs of the first mail!

> They were actually _politer_ than your suggestion above (no
> crap mentioned), and did exactly what you ask for.

My example quote entirely lacked accusations of incompetence; the only
implications are:
(1) Hey, I might be wrong, maybe I just didn't spend enough time on it.
(2) Otherwise, here's what I'd do, you're smart enough to run with it if
it makes sense or explain why I'm barking up the wrong tree.

See Andrew Morton's correspondence for how his "maybe I'm dumb but" tone
is used to brutal effect...

> I would like to note that your original "fix things by dropping the lock"
> patch that I hated so much had this exact bug too. Making this a good
> example of _why_ it's basically always wrong to drop locks in the middle -
> even if you claimed you knew and understood the locking.

And I would like to note that it didn't :) It grabbed references only on
completed modules.

It still had that find_module race, but the kset error actually it. The
lock reduction patch needs that fixed.

> So I do hope we can agree to call the module locking "total and utter
> crap", ok?

OK. And I agree that I was too cautious here; I should have taken the time
to fix locking once and for all. That's much easier to do when there's a
Linus yelling at you about it.

> And yes, my code was crap too. I wrote it for the MODULE_UNLOAD case, and
> only added a comment saying the !unload case was broken, but didn't look
> at just _how_ broken it was. My bad.

Oh no, your code was broken for UNLOAD too.

And sloppy; you would have got a checklist reply email if you had mailed it
anonymously :)

Thanks for the consideration,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/