Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

Andries.Brouwer@cwi.nl
Wed, 20 Aug 1997 15:51:51 +0200 (MET DST)


Alex Belits:

: really? If I'll use filenames how they are represented in filesystem (say,
: UTF-8) and the rest of text in 16-bit encoding, it will be impossible to
: edit unless the editor knows where filenames are

: Because everyone will have to encode/decode them on every file operation
: if they really want to preserve system calls. Should I explain why it's
: important to have trailing 8-bit zero in all strings that are passed to
: kernel? But UTF-8 is unusable as the internal format -- even regexps on it
: will become a monster. So, again hello horrible 8 -> 16 -> 8
: conversions on every operation with text...

Not so pessimistic...
Plan 9 did everything internally in 16-bits, and the conversion
was not very difficult.
On the other hand, when I was working on this stuff (two years ago?)
I used UTF-8 internally, which has the big advantage that nobody notices
that it is not ASCII or Latin-1 and almost no conversion is needed.
It is very easy to teach regexp routines which bytes start a character.

Andries