Re: vfat: Broken case-insensitive support for UTF-8

From: Al Viro
Date: Sun Jan 19 2020 - 18:08:24 EST


On Sun, Jan 19, 2020 at 11:14:55PM +0100, Pali RohÃr wrote:

> So when UTF-8 on VFS for VFAT is enabled, then for VFS <--> VFAT
> conversion are used utf16s_to_utf8s() and utf8s_to_utf16s() functions.
> But in fat_name_match(), vfat_hashi() and vfat_cmpi() functions is used
> NLS table (default iso8859-1) with nls_strnicmp() and nls_tolower().
>
> Which means that fat_name_match(), vfat_hashi() and vfat_cmpi() are
> broken for vfat in UTF-8 mode.
>
> I was thinking how to fix it, and the only possible way is to write a
> uni_tolower() function which takes one Unicode code point and returns
> lowercase of input's Unicode code point. We cannot do any Unicode
> normalization as VFAT specification does not say anything about it and
> MS reference fastfat.sys implementation does not do it neither.

Then how can that possibly be broken? If it matches the native behaviour,
that's it.

> As you can see lowercase 'd' and uppercase 'D' are same, but lowercase
> 'Ä' and uppercase 'Ä' are not same. This is because 'Ä' is two bytes
> 0xc4 0x8d sequence and comparing is done by Latin1 table. 0xc4 is in
> Latin 'Ã' which is already in uppercase. 0x8d is control char so is not
> changed by tolower/toupper function.

Again, who the hell cares? Does the behaviour match how Windows handles
that thing? "Case" is not something well-defined; the only definition
is "whatever weird crap does the native implementation choose to do".
That's the only reason to support that garbage at all...