Re: A Great Idea (tm) about reimplementing NLS.

From: Måns Rullgård
Date: Fri Jun 17 2005 - 08:25:53 EST

Next message: Erik Slagter: "Re: Inspiron 6000 / ACPI S3 / PCI-X problems?"
Previous message: Lars Roland: "Re: tg3 in 2.6.12-rc6 and Cisco PIX SMTP fixup"
In reply to: Lennart Sorensen: "Re: A Great Idea (tm) about reimplementing NLS."
Next in thread: Robin Rosenberg: "Re: A Great Idea (tm) about reimplementing NLS."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

lsorense@xxxxxxxxxxxxxxxxxxx (Lennart Sorensen) writes:
> You have probably slightly misunderstood UTF8 at least. UTF8 tries very
> hard to make sure you can't mistake the characters for ascii, so it
> makes the first byte contains some 1's follwed by one zero. The number
> of 1's indicates how many bytes the character contains, after the 0 the
> remaining bits is used to store bits for the character. The remaining
> bytes are all 10xxxxxx which stores another 6 bites of the character code.
> One is required to use the shortest form of utf8 that can store the
> character you are encoding.

Some characters can be encoded in several equally shortest ways. For
instance, characters with multiple diacritics can have these applied
in different orders. One of these is designated the canonical
encoding, and should be used in favor of the others. Those things,
among others, are what makes unicode difficult to deal with.

--
Måns Rullgård
mru@xxxxxxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Erik Slagter: "Re: Inspiron 6000 / ACPI S3 / PCI-X problems?"
Previous message: Lars Roland: "Re: tg3 in 2.6.12-rc6 and Cisco PIX SMTP fixup"
In reply to: Lennart Sorensen: "Re: A Great Idea (tm) about reimplementing NLS."
Next in thread: Robin Rosenberg: "Re: A Great Idea (tm) about reimplementing NLS."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]