Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

Erik Corry (erik@arbat.com)
Thu, 21 Aug 1997 02:18:32 +0200 (MET DST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Previous message: H. Peter Anvin: "Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish"

> The problem is that lots of the decisions on asian encodings in Unicode
> were made by non-asians. Many of the decisions on character unifications
> are non-sensical.
>
> Basically Japanese look at Unicode as something forced upon them by
> idiots who only understand single byte encodings and european languages
> with latin based character sets.

People interested in this(!) should use Deja News to take a
look at the long discussion from 1995 that started off as
"Fuck the Asians" and ended up being called "Han unification
in Unicode". For example, take a look at the posting by
Glenn A. Adams under
<http://xp5.dejanews.com/getdoc.xp?recnum=2004085&search=thread&threaded=1&server=db95q23>
and the postings by Martin Duerst on the subject.

The impression I got from the whole discussion is that the
Japanese objections to Han unification are not tenable, and
that Han Unification was a process initiated in the Far East
and approved of by JISC the Japanese standards organisation.

Most people have objections to decisions made in Unicode. This
is inevitable in a standard of this size, on a subject that
raises such emotions. Many of the perceived inconsistencies
arise from the need to convert losslessly from other encodings
to Unicode and back.

> >From what I understand the Russians hate Unicode because their nice simple
> single-byte KOI8 encoding became mangled into double bytes.

Inconvenient. The same thing happened to Latin-1 in UTF-8.

> And it's not even sorted in Russian alphabetic order!

That seems a little gratuitous, even given their (correct)
opinion that alphabetical ordering problems cannot be solved
merely by picking the correct encoding.

There is no character set encoding in which the German characters
are in alphabetical order. And if there were, the same characters
would be out of order in Swedish, where the same characters exist
but are ordered differently. You get used to it.

For that matter, ASCII is not in alphabetical order. For that
the order would have to be AaBbCcDdEeFfGgHhIi etc.

> Chinese primarily use BIG5 encoding. At least, they do on the web :-)

This is their prerogative. However BIG5 is not suitable as an
interchange-encoding for Linux (at least not outside China).

-- 
Erik Corry erik@arbat.com http://inet.uni-c.dk/~ehcorry/

Previous message: H. Peter Anvin: "Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish"