Oh please, let it die.
A filename is just a sequence of bytes. Since we like to list files and people
like to see/use the weird symbols they have in their favourite character sets
we want to extend a filename from a sequence of bytes to a sequence of glyphs.
(they might also have some filesystems were they think of the filenames as
a sequence of to them familiar characters that they want to access from linux),
Unicode allows us to have more than 256 glyphs that we can use at once, and
UTF8 is just a convenient encoding so we can still use all this old stuff
everywhere (and we don't waste too many bytes since much old stuff, e.g. ftp
sites, have names that are byte sequences), where we can continue seeing / and
\0 as the only special characters
We didn't have language tagging before or after, we just gained an extension of
our usable glyph set.
Think e.g. of how unix sees files as streams of bytes, though at the time
record based filesystems were popular. The idea is that the kernel should
just provide a simple model, your apllication programs do things like
interpreting \n as a record separator in text files (while it means no such
thing to a byte-interpreter working through a byte compiled program).
Same thing with internationalization. The applications must think of how they
do things like language tagging, line breaking, direction of display etc.
Unicode just enables these programs by giving them a sufficiently rich glyph
set to play with. You are basically trying to push an application's
responsibility into something thats just a byte sequence<->glyph sequence
convertor. Sure, we could try to do that, but we shouldn't. Wrong abstraction.
Unicode is an enabler of i18n, not an i18n method.
Discussion of what is a good encoding of the glyphs in the kernel and how we
best map filesystems with built in codepage/character sets to and from our
brand new glyph universe in the kernel belong here.
Discussion of why you think unicode is a misguided solution to a problem that
it was not trying to solve does not.
(maxim of the day: unicode is not the solution, but it's also not the problem)
-- My pid is Inigo Montoya. You kill -9 my parent process. Prepare to vi.- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu