Re: unicode (char as abstract data type)

Lin Zhe Min (ljm@ljm.wownet.net)
Fri, 15 May 1998 00:06:42 +0900 (CDT)


First, I do think we go back to the kernel issues, or we are
INTRUDERS. I really not want to annoy someone who may concern.

Second, a bad news that seems everyone has already known: M$ won
the sue and M$ Windows 98 is to be published on time (the 15th may).
It's really a, i.e., imbalance of marketing and has a coeffect
which ties US government and M$ altogether, that, in the future
the government would pay even greater effort to catch it and
dispose. :<

On Mon, 11 May 1998, Alex Belits wrote:

> My definition of text file is a file that contains a stream of
> characters that represent text, possibly separated into lines. MIME type
> text/plain. Postscript files, executables and even images also contain
> text, however some people see them as different type of files.

Yes, by your definition, (ugly) MS Word document is a text file,
which you seemed not agree with. However let's stop this discussion,
'cause it's almost meaningless to Linux kernel. Text files and
manipulations leaves in the user space.

> > The fact of the matter is, people are using Unicode for storing and
> > processing text.
> Again, check, what is actually used and where. IETF "stnadardized" HTML
> on Unicode, too.

I don't think we can reach a local library. For sake of stupid M$
users, National library in Taiwan uses Big5 encoding on their pages.
However you may try telnet://192.192.13.10. It's National library Taiwan
provincial branch. There are Unicode book index.

> > So will ext2fs --- not because Microsoft said so, but
> > because it's the right thing.
> Unicode is ugly and unusable, and despite your and IETF hypocricy, such
> idiocy won't be used. I can speak only from my experience, and every
> Russian-speaking programmer that I have seen, when asked if he is going to
> use Unicode anywhere outside charset translator said that he won't do it
> in any case.

However I urge a complete internationalisation method. There must be
one way, which Unicode is pointing now, or you may point another.
Not only filenames but also terminals and text editors (even TeX),
I hope them to support manipulating CJK and Cyrillics + Arabic + ...
texts simutaneously. And to reach this goal, there must be some
change in the kernel and in libc and in ext2fs, for an easier way
for the compatibility. Or, there shall be a lot of mess of MULE
documents, something made of TeX and requiring huge fonts, or
some precompiled so that they're not-world-readable/reusable.
And I need multilingual (multi-charset) filenames too. How do you
think of sending some Japanese named files to my Japanese friend
who cannot see/decode Chinese Big5 filenames? It's rough to use
English filenames, 'cause that's not our native languages.

After all, Unicode is, also in my experience, a can-do way. It may
not be perfect, but it just reaches the point.

However there is one thing come to my mind: if ext2fs is to use
UTF-8 (or UCS-2, whatever), where in the kernel and in libc would
be changed? How effective? Coeffects? I've read almost every messages
but most of all are discussing "I agree" or "I don't agree" or "They're
but a mess". That won't help.

If there would be a vote, a 'yes' from a programme in China.

.e'osai ko sarji la lojban. ==> Please support the logical language.
co'o mi'e lindjy,min. ==> Goodbye, I'm Lin Zhe Min.
Fingerprint20 = CE32 D237 02C0 FE31 FEA9 B858 DE8F AE2D D810 F2D9

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu