Re: [Patch] Support UTF-8 scripts

From: "Martin v. LÃwis"
Date: Sat Sep 17 2005 - 01:29:14 EST


Bernd Petrovitsch wrote:
> On Fri, 2005-09-16 at 22:41 +0200, "Martin v. LÃwis" wrote:
> [ Language-specific examples ]
>
> And that's the only working way - the programming languages can actually
> do it because it defines the syntax and semantics of the contents
> anyways.

It works from the programming language point of view, but it is a mess
from the text editor point of view.

Even for the programming language, it is a pain to implement: what
if you have non-ASCII characters before the pragma that declares the
encoding? and so on.

> With this marker you are interferign with (at least) *all* text files.

Hmm. What does that have to do with the patch I'm proposing? This
patch does *not* interfere with all text files. It is only relevant
for executable files starting with the #! magic.

> And thus with *all* tools which "handle" those text files.

This is simply not true. My patch does not interfere with any such
tools. They continue to work just fine.

>>So you *must* use encoding declarations in some languages; the UTF-8
>
>
> ... if you absolutely want to use Non-ASCII characters in the source
> code. In most (if not all) of them exist a native gettext()
> interface ...

True. However, this is more tedious to use. Also, it doesn't apply to
all cases: e.g. if you have comments, documentation etc. in the source
code, gettext is no option.

Likewise, people often want to use non-ASCII in identifiers (e.g. class
LÃsung); this can also only work if you know what the source encoding
is. You may argue that people just shouldn't do that, because it does
not work well, but this is not convincing: it doesn't work well because
language developers are to lazy to implement it. In fact, some languages
(C, C++, Java, C#) do support non-ASCII identifiers (atleast in their
specifications); there really isn't a good reason not to support it
in scripting languages as well.

> And there are always tools out there which simply do not understand the
> generic marker and can not ignore it since these bytes are part of the
> file.

This conclusion is false. Many tools that don't understand the file
structure still can do their job on the files. So the fact that a tool
does not understand the structure does not necessarily imply that
the tool breaks when the structure changes.

> Or another example: (Try to) start a perl/shell/... script (without
> paranmeter on the first line) which was edited on Win* and binary copied
> to a Unix system. Or at least guess what will happen ....

For a Python script, I don't need to guess: It will just work.

Regards,
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/