Re: unicode

Guest section DW (dwguest@win.tue.nl)
Fri, 15 May 1998 15:42:34 +0200 (MET DST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Andrej Presern: "BM$"
Previous message: Robert Kaiser: "Re: DMA from/to user-space memory"

Jan Vroonhof writes:

> Since you do seem to know what you are talking about,
> are all of these cases (apart from the accents)
> like this [capital letter X (U+0058) vs Roman numeral X (U+2169)]
> where, given context, the human at least has a chance of succeeding)?

Perhaps. Unicode is a mess.
For many symbols, font variations have been registered
as separate symbol.
For example, the script letters B, E, F, H, I, L, M, P, R
have separate Unicode encoding, but not the remaining capitals.
For some of these an intended meaning is indicated
(Script B = Bernouilli function), for some one is
left in the dark (Script I).

So, even a human specialist who knows what his text is about,
who sits with the Unicode standard in his hands, will be unable
to decide whether his script I should be a U+2110, or is just
an I in a font with script characters.

There is much more to say about strange inconsistencies in Unicode,
but that would bring us too far off-topic.

What was the topic? Ted stated that he thought it a good idea
to agree that filenames were in UTF-8.
I refuted that - the filesystem stores bytes, both in files
and in filenames, nothing more - it is the task of a higher
level to worry about languages, character sets and religious
beliefs of the user.
Since nobody contradicted, maybe we now all agree on this part.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu

Next message: Andrej Presern: "BM$"
Previous message: Robert Kaiser: "Re: DMA from/to user-space memory"