Re: [Patch] Support UTF-8 scripts

From: "Martin v. Löwis"
Date: Fri Sep 16 2005 - 15:42:06 EST


H. Peter Anvin wrote:
> You don't have markers (although they're defined, see ISO 2022) for your
> 8-bit encodings, and *THEY'RE THE ONES THAT NEED TO BE DISTINGUISHED.*
> Flagging UTF-8, especially with the BOM (as opposed to the ISO 2022
> signature, <ESC>%G) is pointless in the context, since you still can't
> distinguish your arbitrary number of legacy encodings.

In programming languages that support the notion of source encodings,
you do have markers for 8-bit encodings. For example, in Python, you
can specify

# -*- coding: iso-8859-1 -*-

to denote the source encoding. In Perl, you write

use encoding "latin-1";

(with 'use utf8;' being a special-case shortcut).

In Java, you can specify the encoding through the -encoding argument
to javac. In gcc, you use -finput-charset (with the special case of
-fexec-charset and -fwide-exec-charset potentially being different).

So you *must* use encoding declarations in some languages; the UTF-8
signature is a particularly convenient way of doing so, since it allows
for uniformity across languages, with no need for the text editors to
parse all the different programming languages.

Regards,
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/