RE: revert yenta free_irq on suspend

From: Linus Torvalds
Date: Sun Jul 31 2005 - 00:34:03 EST




On Sun, 31 Jul 2005, Brown, Len wrote:
>
> If one believes that suspend/resume is working on a large number of
> systems -- working to a level that a distro can acutally support it,
> then restoring our temporary resume IRQ router hack to make many systems
> work is clearly the right thing to do.

I don't believe that it works on a huge number of devices as-is, no.

But I'm definitely even less likely to believe in this "two steps forward,
one step back" dance. Because as far as I can tell, it's equally often
"one step forward, two steps back", and nobody can tell when we go forware
more than we go backwards.

So I'd rather have "one tenth of a step forward, but if we see even a
_hint_ of a step back, we revert immediately".

I realize that sounds damn timid and boring, but the thing is:

- even _if_ (and quite frankly, judging by the complaints, I find that
unlikely) we're doing more forward progress than backwards progress
("backwards progress? you moron!"), the "one step back" thing is really
doing a _huge_ amount of psychological damage to the whole thing.

The thing is, we're better off making very very slow progress that is
_steady_, than having people who _used_ to have things work for them
suddenly break.

So I believe that if we fix two machines and break one machine, we've
actually regressed. It doesn't matter that we fixed more than we broke:
we _still_ regressed. Because it means that people can't trust the
progress we make!

So this is why I'm a very strong proponent of the fact that if we _ever_
have anybody complain that a patch broke things, we should immediately
revert it, unless we can fix it asap.

Btw, this argument is much less true in areas where we can "think" about
the problems. In non-driver/non-firmware cases.

When we can logically argue from a purely theoretical standpoint for a
"known correct solution", and expect the theoretical argument to actually
be reflected in practice, I'm much more open to an argument of "ok, we
know where we are going, and we'll have to break a few eggs just because
the changes are extensive".

But when it comes to device drivers and badly documented stuff that
developers can usually not even reproduce, our #1 strength is the people
for whom it works, and when something breaks, that's a huge big red flag,
and then I really think that "revert or fix immediately" is the only
reasonable alternative. Otherwise we'll just oscillate about a point that
we don't even know where it is, and have no way to judge if the
oscillations are getting more violent or are dampening out - we don't have
a reference point.

But as with everything, there is no total black-and-white case. Things
_do_ break occasionally, and clearly we can't guarantee nonbreakage or
we'll end up being static.

But this particular thing has clearly caused a _lot_ of noise, with
second-order breakage from fixes from the first order one. At the very
least alternatives should be tried, and I think there _are_ much less
intrusive alternatives that are _much_ less likely to have these kinds of
negative side effects.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/