Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

Andries.Brouwer@cwi.nl
Tue, 26 Aug 1997 16:54:21 +0200 (MET DST)


Matthias Urlichs:

: Anyway, the kernel will use UTF-8 (or Latin-1) for file names simply
: because nothing else works.

Ach, all this nonsense written by otherwise good people.

A character set is a mapping relating codes to symbols.
Symbols are abstract entities, that live in people's mind,
not in the kernel.

Users present the ext2 filesystem with byte sequences,
and the kernel just uses these byte sequences as filenames.
(Only the bytes '/' and 0 are special as separator and terminator.)

The ext2 filesystem does not use UTF-8, it does not use Latin-1,
it does not use any character set at all, it has file names that
are sequences of bytes.
For some users these byte sequences may become meaningful
if they interpret them as coding symbols in some character set,
like ASCII or koi-8 or vniscii.
These users may even see the glyphs that they usually associate
to these symbols if they use some appropriate font.
All this is entirely up to the user, and not a kernel matter.

This is the Unix point of view.
For some non-ext2 filesystems matters are a bit different.

Andries