Re: UTF-8 practically vs. theoretically in the VFS API

From: H. Peter Anvin
Date: Wed Feb 18 2004 - 15:04:42 EST


Tomas Szepe wrote:
> On Feb-18 2004, Wed, 07:35 -0800
> Linus Torvalds <torvalds@xxxxxxxx> wrote:
>
>>But it makes perfect sense to use a policy of:
>> - escape valid UTF-8 characters as '\u7777'

[And e.g. \U00017777 for characters above \uFFFF]

>> - escape _invalid_ UTF-8 characters as their hex byte sequence (ie
>> '\xC0\x80\x80', whatever)
>> - (and, obviously, escape the valid UTF-8 character '\' as '\\').
>>
>>Don't you agree? It clearly allows all the cases, and you can re-generate
>>the _exact_ original stream of bytes from the above (ie it is nicely
>>reversible, which in my opinion is a requirement).
>
> I really really hope this is _exactly_ what we're going to see in practice.
>

Same here. This is clearly The Right Thing[TM].

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/