Actual environment size comparison of CML1 and CML2

From: Eric S. Raymond (esr@thyrsus.com)
Date: Sat May 27 2000 - 01:41:58 EST


david parsons <orc@pell.portland.or.us>:
> The whoops-time-to-fork-the-linux-kernel showstopper for me is that
> the reference implementation of this new configuration language is
> written in Python, and, given the fluidity of linux kernel
> development and the impossibility of getting patches to Linus unless
> you're a member of the Core Team, this would probably mean that the
> Python implementation would be the only implementation that would
> ever work.

I hate to ruin a nice juicy flamewar by introducing such dull things as
facts into it, but...

On Red Hat, a Python 1.5.2 RPM installation looks as though it
requires about 5M. This is probably more than a bare-bones
install on other distributions; it includes the Tix widget set
and a bunch of other goodies. I shall bend over backwards to
the Python-dislikers and ignore that detail. 5M sure sounds like a
lot, doesn't it?

Sure does. Until you start thinking about the actual numbers attached
to various possible alternatives. Here are some byte sizes I
collected from my Red Hat 6.2 system and the 2.3.99pre9 kernel tree:

 4,971,072 Python 1.5.2
16,290,796 Perl-5.00503
 2,001,475 Tcl/Tk

   251,538 CML1 config files
   156,183 CML2 rulebase

 1,976,362 CML1 tools (with generated tk files needed to run)
   177,143 CML2 tools (with generated pyc files needed to run)

   165,254 bison-1.28
   309,583 flex-2.5.4a

One thing we see right away is that moving from CM1 to CML2 shaves 189,4574
bytes out of the kernel tree itself. That's not the measure that seems
to exercise people, however.

Another thing we see is that anybody who'd take Perl over Python
on size-economy grounds is smoking serious drugs and should be taken
somewhere to calm down. Perl has its uses but if what we're after
is a minimalist build environment this is not one of them.

So let's compare the size of minimum environments needed for a kernel
build under a couple different more realistic scenarios. We'll agree
not to count stuff like sh, make, and gcc that the kernel needs
anyway.

CML1:
   sizeof(CML1 tools) + sizeof(CML1 rulebase) + sizeof(Tcl/Tk) = 4,229,375

CML2-in-Python:
   sizeof(CML2 tools) + sizeof(CML2 rulebase) + sizeof(Tcl/Tk)
   + sizeof(Python) = 5,304,418

CML2-in-C:
   sizeof(CML2 tools) + sizeof(CML2 rulebase) + sizeof(Tcl/Tk)
   + sizeof(Bison) + sizeof(Yacc)

It's interesting to notice exactly where CML1 is porking up. It
turns out that the generated tk files make a lot of the difference --
kconfig.tk alone is 1567874 bytes, over 1.5M. CML2 makes all that go
away.

Now let's consider the minimum build-environment size for a
hypothetical pure-C implementation of CML2. Let's start with the
parts we can total up:

    sizeof(CML2 rulebase) + sizeof(Tcl/Tk) + sizeof(Bison) + sizeof(Yacc)

Why am I including Tcl/Tk? Because there is no other toolkit for the
GUI mode that is (a) anywhere near as stable, or (b) at all likely to
*be* in a minimum distribution. GTK ain't stable enough yet, nor
deployed enough. So the minimum size for CML2-in-C would be 2632495
bytes.

Let's look at those three numbers:

Case 1: CML1 = 4,229,375
Case 2: CML2-in-Python = 5,304,418
Case 3: CML2-in-C = 2,632,495 (without the CML2 object code itself)

That's kind of interesting. Call me a calculatin' fool, but I only
see 1,075,043 bytes' difference between case 1 and case 2. A hair
over 1M. So I have to ask you: David, is 1M of disk space really a
"whoops-time-to-fork-the-linux-kernel showstopper"? Really?

Another interesting question is whether we can get better space economy
in case 3. Basically this comes down to the question of whether the
object code of CML2-in-C can be made to fit in less than 1,596,880 bytes.

Assuming that CML2-in-C has to do what CML2-in-Python does and doing a
bit of long division, we find that CML2-in-C can only have a smaller
minimum environment than CML1 only if the compression ratio of the
CML2-in-Python vs. CML2-in-C code is less than 9.

If we assume that CML-2-in-C is allowed to have merely a smaller footprint
than CML2-in-Python, the ratio changes to 15 to 1.

That is, to believe that CML2-in-C is a good idea on
total-size-of-environment grounds, we absolutely need to believe that
we can express every line of Python in CML2-in-Python in fifteen or
fewer lines of C. I think it shouldn't take anyone more than ten
minutes of reading the Python code to dispell *that* illusion.

These numbers and ratios are not very sensitive to the largest changes
in CML2's size that I can imagine at this point. I've calculated and
checked; even if CML2-in-Python doubles in size (wildly unlikely at
this point) the resulting space penalty with respect to CML1 would
still be less than 1.1M.

Conclusion: Those of you who are obsessing about Python bloating the
minimum build environment should take a chill pill. Or three. The
generated Tk files are bad enough space hogs that you end up fussing
over a *single megabyte*, fer chrissakes! And it doesn't even have to
live on the target machine...

Now let's get back to the *real* problems, shall we?

-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

It would be thought a hard government that should tax its people one tenth part. -- Benjamin Franklin

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:17 EST