Re: us.kernel.org mirroring inconsistency

Matti Aarnio (matti.aarnio@sonera.fi)
Tue, 5 Jan 1999 16:27:56 +0200 (EET)


Riley Williams <rhw@bigfoot.com> writes:
> Hi Matti.
...
> > That is foolish, in fact I think the BZ2 at this point is foolish
> > thing to do as the only compression algorithm. In case of
> > ftp.kernel.org archive:
> > gz 2.4 GB
> > bz2 2.1 GB
> > That is, the 'bz2' version of compression is only about 1/8:th
> > denser, than gzip.

As long as there is no WIDELY DEPLOYED 'tar' program with
support for BZ2, I don't think it to be an easy replacement
for GZ. Everything else is mostly smoke-screen...
Until then, GZ is the easy to use form, and BZ2 is a neat
alternate which may or may not succeed to replace gzip's
current primary algorithm.

If g(un)zip developers become activated again, we could
yet see a new version of (at least) gunzip which can handle
BZ2, and thus all complaining is over. After all, gunzip
can already open BSD compress, and (SysV?) pack files!

Once that happens, and has been deployed for at least 6 months,
we can scrap the GZ format in favour of BZ2.

> Try 1/7th, or 21% (at least based on your figures)...

Depending which you use divider.
In reality the difference is smaller as the comparison
*should* be done against the raw uncompressed material.

# ls -l linux-2.2.0-pre4.tar*
-rw-rw-r-- 1 11 11 54958080 Jan 5 14:09 linux-2.2.0-pre4.tar
-rw-rw-r-- 1 11 11 10494542 Jan 5 14:06 linux-2.2.0-pre4.tar.bz2
-rw-rw-r-- 1 11 11 12969415 Jan 5 14:12 linux-2.2.0-pre4.tar.gz

bz2: size 19.096% of original
gz: size 23.599% of original
difference: 4.503% of original

# time bunzip2 -c linux-2.2.0-pre4.tar.bz2 > linux-2.2.0-pre4.tar
real 0m33.806s
user 0m31.706s
sys 0m1.692s
# time gunzip < linux-2.2.0-pre4.tar.gz > tmp.tmp
real 0m8.801s
user 0m5.278s
sys 0m1.456s

The beast clearly needs lots of tuning, as bunzip2 is slower
than gunzip by factor 3.8 (but your CPU is free, isn't it ?)

# time gzip -9 < linux-2.2.0-pre4.tar > linux-2.2.0-pre4.tar.gz
real 1m38.052s
user 1m33.668s
sys 0m1.856s
# time bzip2 -9 < linux-2.2.0-pre4.tar > tmp.tmp
real 2m23.211s
user 2m20.769s
sys 0m1.174s

Compression with bzip2 is slower by about factor 1.5 ...
(Now that is surprising when compared with decompression!)

> > Way back when the BSD compress got into trouble with LZW patents,
> > the rapid move to GNU-zip was well founded, but now such a thing is
> > not really warranted.
>
> Obviously you don't have to pay for your time connected to the
> Internet! I do, as do most others in the UK and several other
> countries!!!

I do have many hats, when I speak about ftp.funet.fi, it appears
to be an Academic hat. I can also speak with commercial ISP hat..
(and of course in neither case am I spokesperson for them!)

Commercial dialup users are now getting flat-rate service at
about 8.50 Euros per month, so I won't care about their con-
nection times at all. (I do access services at my work, so I
should know.. If they are not downloading, they are browsing
the web and idling, or - heaven forbid - just idling...)

In every case people should *not* be downloading every full
dump, rather only those patch-files from the last full level
they have, up to the lattest they are interested in.
(Only every 10 levels download the full file, if that often.)

-rw-r--r-- 1 502 101 119828 Dec 28 22:32 patch-2.2.0-pre1.bz2
-rw-r--r-- 1 502 101 136849 Dec 28 22:32 patch-2.2.0-pre1.gz
-rw-r--r-- 1 502 101 116151 Dec 31 07:09 patch-2.2.0-pre2.bz2
-rw-r--r-- 1 502 101 132877 Dec 31 07:09 patch-2.2.0-pre2.gz
-rw-r--r-- 1 502 101 19098 Jan 1 20:58 patch-2.2.0-pre3.bz2
-rw-r--r-- 1 502 101 20596 Jan 1 20:58 patch-2.2.0-pre3.gz
-rw-r--r-- 1 502 101 67341 Jan 3 03:25 patch-2.2.0-pre4.bz2
-rw-r--r-- 1 502 101 74797 Jan 3 03:25 patch-2.2.0-pre4.gz

With these the 7-12% size difference really does not matter.
A 10 kilobyte difference means mere 2-10 seconds in download
time. If people are foolish enough to download 10+ megabytes
at every snapshot over a slow modem, nothing can help it.

> ===8<=== Irrelevant comments cut ===>8===
> > This thinking is basis on why ftp.funet.fi does not mirror BZ2
> > files if the same files are also available as GZ files.
>
> What's the access rate to ftp.funet.fi from sites that have to pay for
> their connections?

ftp.funet.fi has FDDI, which is connected to FUNET core
routers. From those routers:

STM-1/POS to NORDUnet@Stockholm, and from there to USA.

STM-1/ATM to academic campus lans.

STM-1/ATM over FICIX (FICIX is STM-1/ATM switch) to other
nationally interconnected service providers, then various
speeds from 64k up to STM-16/POS to their customers (and
customers^2, etc..) To ISP like sites the national cross-
connectivity capacity usually exceeds 2Mbps. International
capacity costs way more, like 10 times more, than the equi-
valent national transit capacity...
(Indeed I am aware of the price differences in data services
in between Finland, and middle europe..)

> Best wishes from Riley.
> ---
> * ftp://ps.cus.umist.ac.uk/pub/rhw/Linux
> * http://ps.cus.umist.ac.uk/~rhw/kernel.versions.html

/Matti Aarnio <mea@nic.funet.fi>
<matti.aarnio@sonera.fi>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/