Re: [2.6 patch] UTF-8 fixes in comments

From: Adrian Bunk
Date: Tue Apr 29 2008 - 05:44:30 EST


On Tue, Apr 29, 2008 at 10:14:23AM +0200, Willy Tarreau wrote:
> On Tue, Apr 29, 2008 at 10:29:11AM +0300, Adrian Bunk wrote:
> > On Tue, Apr 29, 2008 at 07:06:05AM +0200, Willy Tarreau wrote:
> > > On Mon, Apr 28, 2008 at 06:29:43PM -0700, H. Peter Anvin wrote:
> > > > Willy Tarreau wrote:
> > > > >Is this really needed Adrian ? I mean, everyone reads iso-8859-1, not
> > > > >everyone reads UTF-8.
> > > >
> > > > "Everyone" who speaks a Western European language, perhaps; and even
> > > > then, mostly because a lot of tools still have a "oh, it's not valid
> > > > UTF-8, guess iso-8859-1" mode.
> > >
> > > Or simply because people have not migrated all their install, or have
> > > explicitly disabled UTF-8 a few hours after starting to use it once
> > > they discovered the mess it caused and the poor support from the
> > > tools :-/
> >
> > Non-ancient distributions default to UTF-8 and have tools that handle it
> > fine.
> >
> > If you had bad experiences in the last millenium you should try again.
>
> Well, I accidentally used a freshly installed laptop running mandriva 2008.
> I was typing in a terminal inside KDE (I don't know the program name, sort
> of an xterm, but with huge borders all around). I made a typo in a word and
> typed in a "Ã" (e acute). Pressing backspace to fix it showed me that I
> remove more chars than typed. I tried again. Pressing this letter 5 times,
> then 10 times backspace. I removed 5 chars from the prompt. I suspect that
> if I had used some chars with wider encoding (eg 4 bytes), I could have
> removed as many... Clearly those tools are not ready.
>...

This sounds as if you had UTF-8 characters in a non UTF-8 environment.

If you did your "explicitly disabled UTF-8" then this is what triggered it.

> > > > The most common instance of non-ASCII
> > > > characters in Linux kernel code are people's names, and there are plenty
> > > > of names which aren't representable in either ASCII or iso-8859-1.
> > > >
> > > > The debate on this was years ago, and the consensus was to migrate to
> > > > UTF-8; however, the salient information should be expressed in the ASCII
> > > > character set unless impossible.
> > >
> > > And do we really consider that people's names in *comments* cannot
> > > be converted to pure ASCII ? I'm western european and have always
> > > been against accents in comments (another reason to write comments
> > > in english BTW).
> >
> > Accents are very rare in names in the kernel.
> >
> > Most non-ASCII characters are umlauts and there's no sane way to
> > express them in ASCII (and the vowels without umlaut are pronounced
> > quite differently and might even make names look very strange).
>
> Agreed, but it's been done for *years*. I received mails from people
> spelled "jorn" or "jurgen" and they had no trouble using that spelling
> in their names or mail addresses.

Email addresses are a different topic.

But it's not right in names, and if someone then pronounces their name
according to the wrong writing the result is also wrong.

> > And that's only within European languages, outside it becomes even
> > worse.
> >
> > > Unix and internet have lived without accents for
> > > almost 30 years without anyone really bothering. And now we try to
> > > put them everywhere (even in domain names, implying big security
> > > issues) and it causes real annoyances. People's names have not
> > > changed in 30 years, so I guess that the rules used during this
> > > time to ASCII-fy the names are still usable.
> >
> > The comments in the kernel have been converted to UTF-8 quite some time
> > ago, what I'm fixing with my patch is just some recent non-UTF-8 stuff
> > that creeped in.
>
> Well, if that had already begun, at least you're standardizing.
>...
> Willy

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/