Re: OT: character encodings (was: Linux 2.6.20-rc4)

From: David Woodhouse
Date: Sun Jan 07 2007 - 23:44:11 EST


On Sun, 2007-01-07 at 15:05 -0500, Dave Jones wrote:
> This has been bugging me for a while.
> Viewing the mail I applied in mutt shows his name correctly as RafaÅ
> Applying it with git-applymbox and viewing the log on master.kernel.org
> with git log shows Rafa<B3> And then later when put into email
> it turns into RafaÂ

I believe you need to use the misnamed '-u' option to git-applymbox,
which _really_ ought to be the default behaviour. Otherwise, it fails to
pay any attention to the character set tags in the mail it's decoding --
it commits the sin which rmk was whining about; assuming the input data
is of a given type and ignoring the explicit tags which indicate the
contrary.

The '-u' option is misdocumented as 'causes the resulting commit to be
encoded in utf-8', but in fact I believe it doesn't necessarily do that
-- it actually causes the resulting commit to be encoded in the
configured storage charset for the repository, which just _happens_ to
default to UTF-8 unless otherwise specified. That is something which
should definitely be the _default_ behaviour.

We should make the '-u' behaviour the default, and if anyone really
wants the old behaviour of importing arbitrary data in untagged
binary form overriding its labelling then they can have a separate
option which does that.

--
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/