Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

H. Peter Anvin (hpa@transmeta.com)
27 Aug 1997 06:28:29 GMT


Followup to: <199708270309.XAA01641@netcom.ca>
By author: "Andrew E. Mileski" <aem@netcom.ca>
In newsgroup: linux.dev.kernel
>
> There is only one real problem in all the kernel - the console.
>
> The filesystems (even ISO9660 level 1 which probably has the smallest
> charset), could all get along in a multi-byte environment. It wouldn't
> be portable of course - that can't be helped. As long as a charset
> translation is reversible, nothing else really matters.
>
> We could even specify a multi-byte separator (instead of '/') and
> terminator (instead of 0x00) by using an encoding like UTF-8 does,
> but Unicode doesn't have to be the charset used - it could be anything
> even Klingon, though you lose charset portability.
>
> The console is a problem because it has a fixed representation
> that cannot be mucked with. Example: a space has to look the
> same in all charsets, but may have different charset byte values!
> The console charset is also locale specific.
>
> AFAIK, it is impossible to have a charset used by the entire kernel,
> that is not specific to the locale, unless translation is provided
> for the console.
>

I think this is an issue for the kernel only when it comes to foreign
filesystems like VFAT and NTFS for which character set handling is
required at the filesystem level. VFAT and NTFS use UCS-2 as their
native charset, which cannot be brought to user space in Linux. The
logical thing to do is to convert UCS-2 to UTF-8 and back for those
systems.

For any POSIX-compilant filesystem this is a non-issue, since only the
null byte and the '/' byte matter. Any character set which keeps
those bytes safe is OK to use.

The console is a different issue. It currently uses Unicode as a
"lingua franca" -- converting whatever 8-bit character set it is
currently displaying to Unicode (stage 1), then map Unicode to the
appropriate font (stage 2). This is the Right Thing[TM] to do,
although the handling of non-built-in stage 1 maps is inadequate, and
we need more fonts with stage 2 maps built-in, so that one doesn't
need to load it separately.

-hpa

-- 
    PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
    See http://www.zytor.com/~hpa/ for web page and full PGP public key
Always looking for a few good BOsFH.  **  Linux - the OS of global cooperation
        I am Baha'i -- ask me about it or see http://www.bahai.org/