Unicode normalization (userspace issue, but what the heck)

From: H. Peter Anvin
Date: Tue Feb 17 2004 - 21:51:03 EST


Followup to: <pan.2004.02.15.03.33.48.209951@xxxxxxxxxxxxxx>
By author: Matthias Urlichs <smurf@xxxxxxxxxxxxxx>
In newsgroup: linux.dev.kernel
>
> Not locale, but normalization problems and identical-glyph problems.
>
> Which is actually worse, because you don't have filenames which look
> like crap -- instead you have filenames which look perfectly sane, but
> they still do not work. Example: is an á one character, or is it an a
> followed by a composing ÂŽ?
>
> Mac OSX, just as an example, only uses decomposed filenames. I don't know
> the current situation, but 10.2 has major problems when you try to access
> files with composite characters in their name (across NFS for instance).
>
> I wonder if Linux, i.e. Linus ;-) should decree one single standard
> normalization. (I am NOT saying that enforcing this would be the kernel's
> job!)
>

I believe that for most applications, normalization form C should be
used.

However, I suspect there are some applications for which this would
not apply.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/