Re: [2.6 patch] UTF-8 fixes in comments

From: Adrian Bunk
Date: Tue Apr 29 2008 - 06:43:08 EST


On Tue, Apr 29, 2008 at 12:09:34PM +0200, Willy Tarreau wrote:
> On Tue, Apr 29, 2008 at 11:06:05AM +0200, Helge Hafting wrote:
> > >Well, I accidentally used a freshly installed laptop running mandriva 2008.
> > >I was typing in a terminal inside KDE (I don't know the program name, sort
> > >of an xterm, but with huge borders all around). I made a typo in a word and
> > >typed in a "Ã" (e acute). Pressing backspace to fix it showed me that I
> > >remove more chars than typed. I tried again. Pressing this letter 5 times,
> > >then 10 times backspace. I removed 5 chars from the prompt. I suspect that
> > >if I had used some chars with wider encoding (eg 4 bytes), I could have
> > >removed as many... Clearly those tools are not ready.
> > >
> > So don't use that particular tool
>
> It was not my machine, and had you been there, you would have heard me call
> it names !
>
> > and/or file a bug with the maintainer. :-)
>
> It's too easy to impose crappy designs to end-users and tell them that if
> that does not work they have to file a bug. There are a minimal set of
> things that must be tested before shipping. Seeing that the default
> terminal emulator in KDE on Mandriva 2008 is configured in UTF-8 and does
> not properly render it simply makes me sick. This is broken by design and
> even distros trying to get it working for years still can't cope with it.
> There must be a reason.

I can reproduce your problem in a plain xterm when setting LANG=en_US
(most likely the same problem can occur with other non UTF-8 settings).

In this case I'm actually more surprised that the character is displayed
correctly than that you have to type backspace twice.

Any kind of charset mixing is highly problematic (which is also why my
patch was attached compressed), so if you disable UTF-8 anywhere in a
modern distribution problems are somehow expected (it could also be a
bug in Mandrivas default settings, but that would really surprise me).

>...
> > Unicode gives userland an opportunity to actually work decently
> > for the first time.
>
> Unicode yes, UTF-8 no. UTF-8 is a compressed encoding of unicode.
> That's as silly as if you had to replace your terminals to read
> native gzip, and expect them as well as all the tools to work
> properly!

It's not a compressed encoding, it's a variable-length encoding.

Besides the size advantages one main advantage of UTF-8 is that ASCII is
valid UTF-8. This means that for the ASCII source code in the kernel it
doesn't matter whether it's treated as ASCII or UTF-8, and no conversion
was needed.

You can't get this property with a fixed-size Unicode encoding.

>...
> Willy

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/