Re: unicode (char as abstract data type)

H. Peter Anvin (hpa@transmeta.com)
17 Apr 1998 22:04:36 GMT


Followup to: <Pine.BSI.3.95.980417134327.7142E-100000@es1840.genesyslab.com>
By author: Alex Belits <abelits@phobos.illtel.denver.co.us>
In newsgroup: linux.dev.kernel
>
> > > UNICODE is more then just irritating. The problem is that the programming
> > > language thinks in terms of char* text. You start using wchar_t and before
> > > you know it, you have a huge mess and you just can't seem to get the types
> > > quite right anymore.
> >
> > That is why UTF8 is the right format to use in real situations. UTF8
> > works just like ascii in memory handling respects - its just that
> > x++ is no longer always move on one char and strlen(x) isnt the right
> > answer
>
> The problem is, for handling the data in applications UTF-8 is the very
> worst format ever invented by a human.

Hardly. Try UTF-7 or ISO 2022 if you want a truly hideous format; or
the rapidly deprecated UTF-1. Alex, you're already on record as
having an axe to grind because UTF-8 doesn't assign single-byte
characters to Russian characters, so I presume everyone already know
to take what you're saying with a grain of salt.

UTF-8 is actually very well done given the constraints imposed on it.
Yes, it's a compromise, but it had to be.

As far as Unicode being irritating (responding to the > > > poster
above); I think we have to remember that internationalization is
*hard*, and part of why it's hard is because for the longest time you
couldn't even write any language other than bastardized English
(bastardized because you couldn't write words like naďve or résumé
properly). Now, with 8-bit charsets being common, people living in
countries where 8 bits are enough (especially ISO 8859-1 countries)
are whining about the complexity of supporting more than 8 bits.

I really would hate to see Linux falling behind in this area.

-hpa

-- 
    PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
    See http://www.zytor.com/~hpa/ for web page and full PGP public key
        I am Bahá'í -- ask me about it or see http://www.bahai.org/
   "To love another person is to see the face of God." -- Les Misérables

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu