Re: A Great Idea (tm) about reimplementing NLS.

From: Patrick McFarland
Date: Fri Jun 17 2005 - 03:52:26 EST

On Friday 17 June 2005 04:21 am, Måns Rullgård wrote:
> Patrick McFarland <pmcfarland@xxxxxxxxxxxx> writes:
> > On Thursday 16 June 2005 11:04 am, Lennart Sorensen wrote:
> >> Most people seem happy with 50 or so being a good limit even though
> >> many systems support much longer.
> >
> > 50 characters or 50 bytes? Because in the case of UTF-8, if you do a lot
> > of three byte characters (which require four bites to encode), 50 bytes
> > is very short.
> What do you mean by three-byte characters requiring four bytes to
> encode? Is a three-byte character not a character encoded using three
> bytes?

(implication of utf8 and not utf16 goes here)

Very few Unicode characters require three bytes, instead of the usual one or

For one byte you just have the byte.

For two bytes, you really have three: a control code stating "the following
two bytes are a two byte character", and then the two bytes.

For three bytes, you really have four bytes: a control code stating "the
following three bytes are a three byte character" and then the three bytes.

Unless I've completely misunderstood the Unicode specification, this is what
is going on.

> As for 50 bytes being too short, many of the multibyte characters are
> equivalent to several English characters, so fewer of them are
> required. You have a point, though.

Any English characters (ie, the first 127 ascii characters) map directly to
the first 127 Unicode characters (if thats what you meant).

Patrick "Diablo-D3" McFarland || pmcfarland@xxxxxxxxxxxx
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

Attachment: pgp00000.pgp
Description: PGP signature