Re: unicode

Bruce J. Bell (bruce@ugcs.caltech.edu)
15 May 1998 05:28:08 GMT


On Thu, 14 May 1998 13:35:16 -0400, Theodore Y. Ts'o <tytso@MIT.EDU> wrote:
> Date: Fri, 15 May 1998 00:55:43 -0700 (PDT)
> From: Alex Belits <abelits@phobos.illtel.denver.co.us>
>
> Because re-encoding is the last and worst thing that one may want to
> happen with them -- charsets/language labels are necessary for displaying
> characters with fonts that are mapped to charsets and applying rules that
> are mapped to languages (capitalization, hyphenation, phonetic match). The
> initial assumption is that adding reasonable support for fonts and rules
> is possible without exposing any other encoding or charset to application.
> Then no one re-encodes anything except when handles charset-specific
> devices or charset-specific filesystems.
>
>None of the above (capitalization, hyphenation, and phoentic match) are
>required for filenames. They are required if you are using a word
>processor (such as Microsoft Office's Word, which is also using Unicode
>internally to store all of their documents, so they've managed to solve
>this problem), but that's not we're talking about here on the
>linux-kernel mailing list.

Please tell me again just where the Linux-kernel proper needs to pay
attention to specific encodings?

I realize the need to transcribe filenames for "foreign" filesystems
mounted on Linux. E.g., DOS/Windows filesystems need to transcribe to
deal with "/" vs. "\"; there are similar problems for Mac filesystems
that permit any character in a filename.

If some other OS conventionally stores its filenames as EBCDIC, or XXX
flavor of Unicode, or something else, it makes sense for the module for
*that filesystem* to be able to convert filenames to and from some
"Linux-native" encoding such as ASCII or (hypothetically) Unicode, or
(possibly) something else. Is there any other situation where the kernel
should care about the encoding?

This filesystem-specific translation could be done by default, or at least
by some option specified at mount-time; I suppose you should be able to
tell the filesystem to let (say) its raw EBCDIC filenames through as much
as possible. However, outside that limited scope, is there any reason why
the kernel should bother to identify and translate between encodings?

If not, maybe it would be best to restrict the discussion to the scope
it deserves. For instance, the question would be completely irrelevant
for kernel internals, for all native filesystems, and for most (?)
existing foreign filesystems as well.

That's not to say that it's not an important question. But I don't
see any concrete suggestions on how to satisfy anyone's needs. What
mount options would Alex Belits want for a Java-OS filesystem that used
16-bit Unicode?

Bruce

-- 
You are the lens of the world:
the lens through which the world may become aware of itself.
The world, on the other hand, is the only lens in which you can see yourself.
It is both lenses together that make vision.   (--R. A. MacAvoy)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu