Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)

From: John Bradford
Date: Mon Feb 16 2004 - 10:39:47 EST


> > - convert all files from the previous charset to UTF-8 overnight
> > if the previous charset was unknown, first make sure that you can
> > guess it for all users and contact users that have files with
> > suspicous filenames (eg. not convertable from Latin1). Or look troug=
> h
> > their shell/X config files (*)
>
> Hazardous.
>
> > - in libc, implement a recoding function to convert file names from
> > LC_CTYPE to the underlying UTF-8 encoding
>
> Hmm.. could be fun if somebody is calling 'open', and the UTF-8 encoding
> requires the insertion of extra characters to encode it - what do you do =
> then?
> That looks like a security hole just waiting to happen. Probably has lot=
> s of
> lurking corner cases too - what if you creat() a file, then do a readdir(=
> ) and
> strcmp() each entry looking for your file (while comparing a filename sma=
> shed
> to UTF-8 to the original unsmashed string)?

The current situation is that so many applications simply treat
filenames as arbitrary sequences of bytes. With many encodings, this
simply happens to work, and an encoding mis-match will result in some
incorrect characters being displayed for byte values > 127. However,
some encodings, such as UTF-8, are simply _not_ compatible with the
'you can also treat it like an arbitrary byte string model', and there
is a very real potential for security holes in bad implementations if
we go down the "it's an arbitrary byte string, but you _should_ store
UTF-8 there" route.

Maybe we should forget filename encoding altogether, and start
thinking of filenames as arbitrary sequences of _32-bit words_.
Existing applications can store their arbitrary byte sequences in the
low byte, and new calls can be added to provide Unicode-aware
userspace applications with access to the 32-bit space, which _must_
be used for UCS-4.

John.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/