Re: libc types [was: SCSI device numbering]

Stephen C. Tweedie (sct@dcs.ed.ac.uk)
Mon, 1 Jul 1996 15:40:06 +0100


Hi,

On Thu, 27 Jun 1996 17:48:19 +0200, Andries.Brouwer@cwi.nl said:

> Stephen C. Tweedie:
> :: Just do it already! :-) If we need a new fs type
> :: like ext3, then LET'S GET ON WITH IT! Sheesh! :-)

> : Easy to do. ...

> I am afraid the kernel patch is larger - with a 48-bit minor,
> one can no longer use the minor to index arrays, for example.

The point was that implementing code to store the larger dev_t in the
filesystem is easy, and that extending the kernel ABI is just a matter
of adding a newer, versioned interface to a very few syscalls. Once
the larger dev_t is in place, we can then look at how to take
advantage of it in the lower levels of the kernel, and admittedly this
gets complex.

However, it's something we want to do anyway, since there are a number
of things I'd like to do in the kernel which we can't do because we
don't have a universal way of interpreting the minor number as a
bitmap. In particular, I'd like to see regular kernel structures for
each hard drive, to have per-drive IO request queues and to allow
certain optimisations (such as improving the elevator algorithm, which
is currently suboptimal for some extended partition arrangements, and
for allowing the swap striping to run adaptively by swapping to the
device with the shortest request queue).

This is all non-urgent, though. The important thing is to synchronise
the changes in libc and in the kernel to the API and ABI themselves.
We've got a new major libc version coming up and we don't want to miss
that opportunity. One thhe interface is stable, we can implement the
new policies around it.

> : We've talked about this before. The trouble is that it is NOT a
> : filesystem issue, it's a kernel/libc API issue, and _that_ is tricky.
> : Nobody has yet been brave enough to tackle it.
...
> : This is not an impossible task, but it's ugly. What we really need is
> : to synchronise libc and kernel updates for this kind of change.
> : libc-6 and linux-2.2, say. The kernel will have to provide the older,
> : short-dev ABI for a long time, but we can still mandate that the new
> : libc requires a newer kernel.

> Not so pessimistic. I plan to submit a series of patches to the kernel
> so that it (i) gets a kdev_t that is a pointer to a structure,
> (ii) gets new, versioned, stat and mknod system calls,
> (iii) learns about a new struct stat, with larger dev_t,
> where all of this works with the current libc.

Sounds good.

> Maybe you are interested in another change to struct stat as well:
> off_t st_size;
> allows only 2GB files (according to POSIX off_t is a signed type)
> - not very much. On http://www.sas.com/standards/ under
> `Large File Summit' a transition to larger file sizes is discussed.

Yes, I've been tracking that development for some time, and it looks
as if it has now been submitted to X/Open.

> Probably Linux should go the same way as Sun and several others.
> Such a change might or might not be done simultaneous with the
> change for a larger dev_t.

The LFS proposals involve two separate strands: one, the "small
stream", in which we maintain the current API with its 32-bit off_t,
but add extra libc entry points (read64, open64 etc) which operate on
a larger version of off_t. That has to deal with a number of extra
error conditions which occur when, say, a 32-bit stat() call is made
to a file whose size is larger than 2GB. The second proposal is to
make off_t truly 64-bits wide. LFS proposes a standard way of
selecting, at compile time, which version we want. So, there will be
a certain amount of hacking necessary at the libc level to get this
working, but it shouldn't be too hard. David Mosberger has already
been working on getting the kernel's internal VFS layer fully 64-bit
clean on the AXP.

Cheers,
Stephen.

--
Stephen Tweedie <sct@dcs.ed.ac.uk>
Department of Computer Science, Edinburgh University, Scotland.