:: > The OSTA-UDF(tm) filesystem I'm working on supports compressed Unicode.
:: > Basically, the first byte is a flag inidcating how to expand the following
:: > bytes:
:: > 8 = high byte is 0 and low byte is from data stream
:: > 16 = high byte is followed by low byte in the data stream
:: > By the ISO standards, this is CS0 or a character set defined by agreement.
::
:: Not yet another Unicode encoding format?! What's wrong with
:: UTF-8? Not Invented Here?
: UTF-8 maps Unicode to a font as Unicode does not specify how a character
: appears, but rather Unicode differentiates characters from each other.
Strange for someone who works on Unicode not to know
what UTF-8 is. It is a variable length byte encoding
for Unicode (and similar codes, like ISO 10646), with the property
that the ASCII subset of Unicode is mapped with the usual single-byte
values, while these ASCII bytes, in particular NUL and '/', cannot
be part of the multi-byte representation of other Unicode values.
Details can be found in the keyboard and console drivers.
Note that as a consequence each ASCII file, like this message,
is also a UTF-8-coded Unicode file.
UTF-8 has nothing to do with fonts or display. It is just
a way of representing the bits.
Andries