Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

Alex Belits (abelits@phobos.illtel.denver.co.us)
Tue, 19 Aug 1997 00:16:08 -0700 (PDT)


On Mon, 18 Aug 1997, Teunis Peters wrote:

> Beyond that the Chinese still (AFAIK) decided whether or not to actually
> USE unicode [the language has other ways of creating new characters - this
> is not something computers are good at handling], Unicode has largely been
> accepted [mostly by fiat].

AFAIK, Chinese, Japanese and Russians _oppose_ Unicode that is mostly
pushed by people who use iso8859-1 anyway, and thus have trivial mapping
between their native charset and Unicode.

>
>
> Personally I think Unicode is a really good idea... I like the idea of
> being able to put descriptive filenames in files.
> sometimes the native language [eg Japanese] is the only way to describe a
> file.

...and people use native charsets/encodings for a long time already --
and then Unicode appeared to "make it possible". Traditionally everything
network-related was supposed to either be ASCII-only or use MIME charsets
definitions. It worked fine in Russia and Japan (I have no information
about China or Korea), but now Unicode supporters are trying to push
"mandatory" Unicode into HTML. They completely ignore that HTTP is never
used without HTTP header (or META tags), and everyone learned how to add
charset tags there long ago. The same for FTP, even though the only two
known platforms that "support" Unicode at filesystem level are
Windows NT and plan9, while others have absolutely no means even to
provide reliable translation to the "local" charsets because "local" may
be different for different users on the same box -- the concept completely
unknown for Windows FTP servers authors who support that in FTP-WG mailing
list.

> Not that it matters but I think as long as filenames from 16bit+
> filesystems should be encoded into UTF-8 before being passed to the user.

...thus requiring to distinguish them from "normal" data everywhere and
breaking every piece of software that should treat data in files as
filenames (say, "make").

> So what filesystems are dependant on what character set?
>
> FAT : 8-bit IBM-PC
> VFAT : 16-bit Unicode
> ext-2 : Latin-1? (though UTF-8 is supported)

8-bit, not Latin-1. Latin-1 (iso8859-1) is one of charsets used with it.

It's hard to "not support" Unicode -- it's just 8 bits. It's already used
for local encodings, and there is _NO_SUCH_PROBLEM_ as "foreign languages
support" in the 8-bit-clean filesystem. The only two problems that exist
and can be solved by Unicode are:

1. charset tagging (Unicode is so large, it includes everything, or at
least, authors think so);
2. compatibility with Windows NT.

For the first one the solution creates more problems than it solves (such
as breaking everything that treats data as filenames unless everyone will
switch to 16-bit Unicode or, worse, UTF-8, what is the last thing any sane
non-English-speaking person will do to his language). Second one is not
something I care about (non-iso8859-1-speaking Windows NT users aren't
that fond of Unicode anyway).

--
Alex