Re: UTF-8 practically vs. theoretically in the VFS API

From: Marc Lehmann
Date: Mon Feb 16 2004 - 15:35:00 EST


On Mon, Feb 16, 2004 at 09:21:42PM +0100, bert hubert <ahu@xxxxxxx> wrote:
> The remaining zit is that all these represent '..':

No, they don't. Read the UTF-8 definition...

> This in itself is not a problem, the kernel will only recognize 2E 2E as the
> real .., but it does show that 'document.doc' might be encoded in a myriad
> ways.

No, it can only be encoded in exactly one way *in UTF-8*. It can of course
be encoded differently in other encodings, but in UTF-8, there is only a
single representation. There are no ambiguities.

> So some guidance about using only the simplest possible encoding might be
> sensible, if we don't want the kernel to know about utf-8.

Fortunately, this has all already been taken care of, and is not a problem.

I mean, the _definition_ of UTF-8 works. Wether specific applications
(wether in the kernel or apps) work is a different question. But at
least the specification is rather clear.

Compare this to the URL definition, which only hints that you don't know
the encoding, and therefore, the interpretation as text, of a URL unless
you have an extra channel that communicates it.

While possible, this channel does not exist in practise, creating big
problems for people writing i18n-ized web applications.

The thing is that the kernel certainly _works_ on a very basic level, but
I think the situaiton can be improved by making it clear how to interpret
filenames, which currently is not the case.

--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@xxxxxxxx |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/