Re: [PATCH] vsscanf() in lib/vsprintf.c

From: Stefan Kanthak
Date: Wed May 05 2021 - 12:54:57 EST


Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> wrote:
> On 04/05/2021 21.19, Stefan Kanthak wrote:
>> Hi @ll,
>>
>> both <https://www.kernel.org/doc/htmldocs/kernel-api/API-sscanf.html>
>> and <https://www.kernel.org/doc/htmldocs/kernel-api/API-vsscanf.html>
>> are rather terse and fail to specify the supported arguments and their
>> conversion specifiers/modifiers.
>>
>> <https://www.kernel.org/doc/htmldocs/kernel-api/libc.html#id-1.4.3>
>> tells OTOH:
>>
>> | The behaviour of these functions may vary slightly from those
>> | defined by ANSI, and these deviations are noted in the text.
>>
>> There is but no text (see above) despite multiple deviations from
>> ANSI C
>>
>> <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/lib/vsprintf.c?h=v5.12>
>>
>> | /* '%*[' not yet supported, invalid format */
>> ...
>> | /*
>> | * Warning: This implementation of the '[' conversion specifier
>> | * deviates from its glibc counterpart in the following ways:
>> ...
>>
>> More deviations (just from reading the source):
>>
>> 1. no support for %p
>
> What on earth good would that do in the kernel?

| The behaviour of these functions may vary slightly from those
| defined by ANSI, and these deviations are noted in the text.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2. no support for conversion modifiers j and t
>
> Could be added, but do you have a user?

Just fix your documentation.

>> 3. no support for multibyte characters and strings, i.e. %<width>c
>> and %<width>s may split UTF-8 codepoints
>
> So what?

It's a BUG!

> The kernel doesn't do a lot of text processing and wchar_t stuff.

Nobody will ever feed a UTF-8 string to the kernel?

>> 4. accepts %[<width>]<modifier>[c|s], but ignores all conversion
>> modifiers
>
> Yeah, %ls is technically accepted and treated as %s,

just like %Ls and %Hs and %hhs and %zs ... what the documentation
but fails to tell: just fix it.

> that's mostly for ease of parsing it seems. Do you have a use
> case where you'd want wchar_ts?

>> 5. treats %<width><modifier>% (and combinations) as %%
>
> What would you expect it to do?

See the patch: stop and return the number of converted items, like
an ANSI/ISO conformant scanf()

> Seems to be a non-issue, gcc flags that nonsense just fine

Nobody will ever feed a non-constant format string to [v]sscanf()?

>> 6. accepts %<width><modifier>n (and combinations)
>
> Again, non-issue (warning: field width used with ‘%n’ gnu_scanf format)

How does gnu_scanf() handle %0Ln etc.?
Does a warning stop compilation of the kernel?

See above: it's undocumented, and it's not flagged in calls with
non-constant format string.

>> 7. doesn't scan the input for %[...]n
>
> ? What's that supposed to mean.

Argh, my fault: should have been %*

>> 8. uses simple_strto[u]l for the conversion modifier z, i.e. assigns
>> uint32_t to size_t, resulting in truncation
>
> Where do you see uint32_t?

LLP64 vs. LP64, so my last point is invalid.

Stefan