Re: OT: character encodings (was: Linux 2.6.20-rc4)

From: Dave Jones
Date: Sun Jan 07 2007 - 15:06:25 EST


On Sun, Jan 07, 2007 at 07:17:30PM +0000, Russell King wrote:

> commit 24ebead82bbf9785909d4cf205e2df5e9ff7da32
> tree 921f686860e918a01c3d3fb6cd106ba82bf4ace6
> parent 264166e604a7e14c278e31cadd1afb06a7d51a11
> author Rafa Bilski <rafalbilski@xxxxxxxxxx> 1167691774 +0100
> committer Dave Jones <davej@xxxxxxxxxx> 1167799119 -0500
>
> and looking at that "author" closer with od:
>
> 0000140 74 68 6f 72 20 52 61 66 61 b3 20 42 69 6c 73 6b
> t h o r R a f a  B i l s k
>
> clearly not UTF-8. I doubt whether any of the commits I do on my
> en_GB ISO-8859-1 systems end up being UTF-8 encoded.

This has been bugging me for a while.
Viewing the mail I applied in mutt shows his name correctly as RafaÅ
Applying it with git-applymbox and viewing the log on master.kernel.org
with git log shows Rafa<B3> And then later when put into email
it turns into RafaÂ

> But the point is there is charset damage which has happened _long_ before
> Linus' action. There is no character set defined for the contents of git
> repositories, and as such the output of the git tools can not be
> interpreted as any one single character set.

If there's something I should be doing when I commit that I'm not,
I'll be happy to change my scripts. My $LANG is set to en_US.UTF-8
which should DTRT to the best of my knowledge, but clearly, that isn't
the case.

Dave

--
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/