Re: (PLEASE READ THIS) Re: weird 3c590 problems

Peter T. Breuer (ptb@it.uc3m.es)
Mon, 27 Apr 1998 21:42:31 +0200 (MET DST)


Actually, I'll just comment one more thing here, since it's an
interesting topic, and almost completely ignored by the folks out there
developing linux.

"A month of sundays ago Michael H. Warfield wrote:"
>
> Sorry for interjecting here but this hits one of my button pushes...

> [ptb wrote]
> > I can't take chances in a production environment. Any single failure
> > would take days to locate and require enormous work in updates whilst
> > users were howling at me. I backport useful changes one at a time. I

> I've got an engineering department with several dozen engineers plus
> QA people, support people, etc, etc, etc... I've never lost "days to locate"
> and you are devoting an order of magnitude more work into "backporting useful
> changes" and taking an order of magnitude greater risk than I have ever
> had to deal with. Sooo... From what I read in that paragraph above, you

How do you know that? Is there some way you have of evaluating the risk
that 20 assorted developers and 100 unofficial beta testers have made a
mistake against the risk that I have made a mistake? Remember that I
have the advantage of seeing the bug reports and changes and patches
volunteered by everyone. I can decide for myself if a change _they_
make is risky or not by

1) inspection of the code and patch and explanation given
2) examination of the consequences of not making a change
3) evaluating the change in a gradual sequence of trials

but ONLY if I am able to isolate those changes. My objection to taking a
new kernel wholesale is precisely that I do not have the chance to
evaluate the changes made one by one. I want them documented and made
independent, or available as a sequence of minipatches. That ain't the
way it is (but it came close to happening in the pre* series), so I
ain't buying until I see with my own eyes that

a) I need something
b) it works

Along the 2.0.* path I saw a whole mess of new errors introduced.
2.0.30 or thereabouts was a disaster that took ages to recover from in
terms of broken internal interfaces and I saw no incentive to move up
from where I then was at 2.0.25. Things have just stayed that way since
then while I have waited for stability. There has been mounting
evidence of greater and greater stability in the 2.0.3* releases and I
am very thankful for that. I look at those changes and decide what to
do about them. There is pressure to move up in some ways (I need the
newer vm86 code, for example, to make newer versions of dosemu work with
newer libcs, but I have been able to move the vm86 support into a module
for the meantime). But the pressure to move up has to be greater than
the incentive to stay put for me to move.

Sure, I probably will make a mistake, either by omission, confusion, or
commision. But I won't incorporate other peoples mistakes of that kind
if they are not in the code I have studied, and they HAVE made those
mistakes. We know that. I'd rather stick with the bugs I have until I
really can't cope. Life has been fine so far.

I just think that I am more cautious than the developers in general. I
am not budging my codebase until I have to.

And yes, I have not yet been forced to. If, however, it turns out that
I have managed to break the tcp stack, then maybe I'll think again.

> are in the very situation that you are striving to avoid. You are devoting
> enormous efforts into backporting and are taking serious risks in a

Because I am devoting enormous efforts to maintaining a stable set of
systems. I would not distinguish the work in keeping the kernel
code tuned from the work in keeping the application base working.
In fact the work going into maintaining our application base (~500MB in
/usr/local and rising) effectively means that I don't move the kernel
from under them. 2.0.26 changed something like CHR_DEV_MAX, for
example. Suppose an application depended on that? (silly example).

> I can also quote you horror story after horror story of administrators
> who have done exactly what you have done and paid the price. Support people
> refer to some of these calls as "why on my shift" calls. We had one such

> You are gambling on a gimix-gimash of patches, hacks, and cobbles
> on an out of date kernel. In your case, I would say, all bets are off!

And is that any different from a mish-mash of patches hacks and
cobbles in an up to date kernel? I still have 1.2.13 as backup kernels
in all machines! And I sometimes need it. For obscure reasons one can
get hardware that boots via the net into a state where 2.0.* will not
boot (I know why ..) but 1.2.13 will.

> fix for that version. If we changed something, it would no longer be that
> version. You need the latest binaries". Eventually they get past the
> semantics and realize that to get what they want, they have to get a newer
> stable version and not keep asking for some untested, unstable, unsupportable
> hack to a previous version.

Those are your adjectives. "previous version" does not imply unstable
and untested. On the contrary, I assure you that my previous versions
plus patches are both stable and tested. Months of testing in hundreds
of machines with huge varietries of hardware. Yes, it is unsupported,
and so what? Is anyone buying support contracts around here?

> > Thanks. I really appreciate the response (really!)
>
> --
> Michael H. Warfield | (770) 985-6132 | mhw@WittsEnd.com
> (The Mad Wizard) | (770) 925-8248 | http://www.wittsend.com/mhw/
> NIC whois: MHW9 | An optimist believes we live in the best of all
> PGP Key: 0xDF1DD471 | possible worlds. A pessimist is sure of it!

Peter ptb@it.uc3m.es

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu