Re: unicode

Theodore Y. Ts'o (tytso@MIT.EDU)
Fri, 15 May 1998 14:55:02 -0400


Date: Fri, 15 May 1998 15:42:34 +0200 (MET DST)
From: dwguest@win.tue.nl (Guest section DW)

What was the topic? Ted stated that he thought it a good idea
to agree that filenames were in UTF-8.
I refuted that - the filesystem stores bytes, both in files
and in filenames, nothing more - it is the task of a higher
level to worry about languages, character sets and religious
beliefs of the user.
Since nobody contradicted, maybe we now all agree on this part.

You've got to be kidding. Most of this thread was a contradiction of
your argument. For interoperability reasons, we need to know what a
filename means. Otherwise, bad things will happen when you take a ext2
filesystem and move to another system using a different convention. The
contents of the files are the problem of the applications; however, the
meaning of the filenames are very much ext2's business. This is why
NTFS states that its filenames are to be stored in Unicode. Going back
in history, IBM Mainframe systems specify that their filenames in
EBCDIC. Old PDP-9/15 dectapes specified that directory names were
stored in packed a 6-bit ASCII form. It has always been the case that
filesystems defined the character set and encoding used by filenames;
that is part of the scope of what the filesystem's definition.

And for ext2, the default filename encoding *will* be UTF-8. Now then,
can we get back to work? Even with this point settled, there's still a
lot of work to be done if we want Linux to have true
internationalization support.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu