Re: Filesize limitation

Richard B. Johnson (root@chaos.analogic.com)
Mon, 3 Nov 1997 18:04:17 -0500 (EST)


On Mon, 3 Nov 1997, Andre Uratsuka Manoel wrote:
[SNIPPED]

> I didn't say it correctly the first time. The file I was
> reported as having problems being created was slightly larger than 2GB.
> That file is generated every month and every month it gets bigger and
> bigger. In about 6 months it will probably not fit into 4 GB either.
>
[SNIPPED]
There appears to be something fundamentally wrong with a program
that uses such a data file.

In the late '60s IBM created a sort-merge procedure, first used to
sort the Chicago telephone directory. It became known as the "Chicago
Sort". It ran on an IBM-360 with 4 kilobytes of core-RAM which had
to contain both the program and the data. It works.

A few weeks ago, I attempted to use the M$Garbage Editor to edit a 130
kilobyte text file. It reported "out of RAM" and exited. The two stories
are related.

Until Software starts being written by Software Engineers, who are
trained in engineering disciplines, we will continue to have data expand
like gas to fill all available space. If the space isn't big enough,
the programs will crash.

Given the current tendency to throw RAM and Disk Drives at a problem,
it is unlikely that even 64 bits will be good enough in the near future.
This, in spite of the fact that 64 bits exceeds the dynamic range of
the universe (233 dB +/- 20 dB).

Even my Sparc won't help. An 'int' on the Sparc is 32 bits. Even if
you find a 64-bit architecture, that doesn't mean that its file-systems
will support the kind of file sizes that you propose.

The solution is to use files as files. They have names for very good
reasons. If a "master-file" is as long as you propose, it contains
too much information. Such a file should contain "keys" which allow
records existing in other file(s) to be sorted and merged without actually
having to copy any data. The records in the other files(s) should contain
the database information.

Cheers,
Dick Johnson

Richard B. Johnson
Project Engineer
Analogic Corporation
Penguin : Linux version 2.1.60 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.