Re: 2.1GB File Size Limit

Albert D. Cahalan (acahalan@cs.uml.edu)
Fri, 19 Feb 1999 23:41:01 -0500 (EST)


Alexander Kjeldaas writes:

> with the modification that I use the following stat64 structure
> (in linux/include/asm-i386/stat.h). The kernel and glibc have
> incompatible structures and instead of translating, I changed the
> kernel structure:

Don't do that. The glibc people made several mistakes, so they will
need to change their struct stat anyway.

> struct stat64 {
> unsigned long long st_dev;

With an 8-bit minor and a 56-bit major. ROTFL. We don't need this.

Digital UNIX (soon to be "Tru64 UNIX") has this:

typedef int dev_t; /* device number (major+minor) */
#define major(x) ((major_t) (((dev_t)(x)>>20)&07777))
#define minor(x) ((minor_t) ((dev_t)(x)&03777777))
#define makedev(x,y) ((dev_t) (((major_t)(x)<<20) | (minor_t)(y)))

That is, a 32-bit dev_t with a 12:20 split. It is excellent.

> unsigned short __pad1;

To mess up alignment? You get a compiler-dependent gap I guess.

> unsigned long st_ino;

This is not large enough. Inode numbers must be 64-bit on i386 too.
Many filesystems need this, including UDF. Tree-structured filesystems
like Reiserfs and XFS work much better with huge inode numbers.

> unsigned int st_mode;
> unsigned int st_nlink;

No need, but it would be faster on older Alpha hardware. We could really
use a data type that is 16-bit on i386 and 32-bit on Alpha.

> unsigned int st_uid;
> unsigned int st_gid;

OK.

> unsigned long long st_rdev;

Again, 56-bit major numbers and 8-bit minor numbers are moronic.
We really don't need this bloat.

> unsigned short __pad2;

To mess up alignment? You get a compiler-dependent gap I guess.

> unsigned long long st_size;
> unsigned long st_blksize;
> unsigned long long st_blocks;

OK on i386, but these really should have fixed sizes.

> unsigned long st_atime;
> unsigned long __unused1;
> unsigned long st_mtime;
> unsigned long __unused2;
> unsigned long st_ctime;
> unsigned long __unused3;

Ah, the traditional empty space. :-)

The first thing wrong here is the variable size. Choose __u32 or __u64
as needed, not something that varies by architecture.

The second thing missing is intent, which ought to at least be documented.
I've heard that BSD had planned for the 32-bit overflow, then someone saw
the reserved space and grabbed it for nanoseconds.

Assuming we don't care about files older than 1970, the kernel API is
just fine. The C library can pad with zeroes or interpret the standard
requirement of (time_t)(-1) as just that: unsigned is OK.

The layout above is only suitable for little-endian hardware.
We won't be able to switch to a nice 64-bit representation with it.
(imagine that jiffies were stored as two 16-bit values -- ugh!)

In any case, the kernel should not be doing division etc. to convert.
If libc wants a stupid low-performance struct for time, libc can damn
well do the conversion by itself. Time conversion in kernel mode is
just bad.

Units of 1/4294967296 seconds are most natural. That lets us have a
64-bit integer time while still getting plain seconds very quickly.

If a larger time range is desired, use a 24-bit fraction.

For greatest compatibility, use 1/1000000000 or 1/10000000 second units.
This is very nice to use, but somewhat slow for getting plain seconds.

> unsigned long __unused4;
> unsigned long __unused5;
> };

Perhaps this is not enough. Flags, inode version, dtime, author,
alternate permission bits... I suggest at least 6 __u32 slots.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/