Re: [PATCH 05/10] parse_integer: convert lib/

From: Rasmus Villemoes
Date: Mon May 04 2015 - 10:10:41 EST


On Sat, May 02 2015, Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:

> match_number() needlessly allocates/duplicates memory,
> parsing can be done straight from original string.

I suppose that's true, but the patch doesn't seem to do anything about
it? It's probably better to do in a separate cleanup anyway, but then
maybe this belongs as a comment below ---.


> Signed-off-by: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> ---
>
> lib/cmdline.c | 36 ++++++++++++++++++------------------
> lib/parser.c | 29 ++++++++++++-----------------
> lib/swiotlb.c | 2 +-
> 3 files changed, 31 insertions(+), 36 deletions(-)
>
> --- a/lib/cmdline.c
> +++ b/lib/cmdline.c
> @@ -27,7 +27,7 @@ static int get_range(char **str, int *pint)
> int x, inc_counter, upper_range;
>
> (*str)++;
> - upper_range = simple_strtol((*str), NULL, 0);
> + parse_integer(*str, 0, &upper_range);
> inc_counter = upper_range - *pint;
> for (x = *pint; x < upper_range; x++)
> *pint++ = x;
> @@ -51,13 +51,14 @@ static int get_range(char **str, int *pint)
>
> int get_option(char **str, int *pint)
> {
> - char *cur = *str;
> + int len;
>
> - if (!cur || !(*cur))
> + if (!str || !*str)
> return 0;
> - *pint = simple_strtol(cur, str, 0);
> - if (cur == *str)
> + len = parse_integer(*str, 0, pint);
> + if (len < 0)
> return 0;
> + *str += len;
> if (**str == ',') {
> (*str)++;
> return 2;
> @@ -126,38 +127,37 @@ EXPORT_SYMBOL(get_options);
>
> unsigned long long memparse(const char *ptr, char **retptr)
> {
> - char *endptr; /* local pointer to end of parsed string */
> + unsigned long long val;
>
> - unsigned long long ret = simple_strtoull(ptr, &endptr, 0);
> -
> - switch (*endptr) {
> + ptr += parse_integer(ptr, 0, &val);

This seems wrong. simple_strtoull used to "sanitize" the return value
from the (old) _parse_integer, so that endptr still points into the
given string. Unconditionally adding the result from parse_integer may
make ptr point far before the actual string, into who-knows-what.

> + switch (*ptr) {
> case 'E':
> case 'e':
> - ret <<= 10;
> + val <<= 10;
> case 'P':
> case 'p':
> - ret <<= 10;
> + val <<= 10;
> case 'T':
> case 't':
> - ret <<= 10;
> + val <<= 10;
> case 'G':
> case 'g':
> - ret <<= 10;
> + val <<= 10;
> case 'M':
> case 'm':
> - ret <<= 10;
> + val <<= 10;
> case 'K':
> case 'k':
> - ret <<= 10;
> - endptr++;
> + val <<= 10;
> + ptr++;
> default:
> break;
> }
>
> if (retptr)
> - *retptr = endptr;
> + *retptr = (char *)ptr;

And here we propagate that to the caller.

> - return ret;
> + return val;
> }
> EXPORT_SYMBOL(memparse);
>
> --- a/lib/parser.c
> +++ b/lib/parser.c
> @@ -44,7 +44,7 @@ static int match_one(char *s, const char *p, substring_t args[])
> p = meta + 1;
>
> if (isdigit(*p))
> - len = simple_strtoul(p, (char **) &p, 10);
> + p += parse_integer(p, 10, (unsigned int *)&len);

Hm, I think passing cast expressions to parse_integer should be actively
discouraged. While it might work in this case, some day somebody will
copy-paste this to a place where the len variable doesn't have
sizeof==4.

> else if (*p == '%') {
> if (*s++ != '%')
> return 0;
> @@ -68,19 +68,21 @@ static int match_one(char *s, const char *p, substring_t args[])
> break;
> }
> case 'd':
> - simple_strtol(s, &args[argc].to, 0);
> + /* anonymous variable */
> + len = parse_integer(s, 0, &(int []){0}[0]);
> goto num;
> case 'u':
> - simple_strtoul(s, &args[argc].to, 0);
> + len = parse_integer(s, 0, &(unsigned int []){0}[0]);
> goto num;
> case 'o':
> - simple_strtoul(s, &args[argc].to, 8);
> + len = parse_integer(s, 8, &(unsigned int []){0}[0]);
> goto num;
> case 'x':
> - simple_strtoul(s, &args[argc].to, 16);
> + len = parse_integer(s, 16, &(unsigned int []){0}[0]);

I see that the commit log says "don't be scared", and the first of these
even has a comment. But is there any reason to be that clever here? I
see a couple of downsides:

* gcc has to initialize some stack memory to 0, since it cannot know it
is only an output parameter.

* things like this usually consume an excessive amount of stack. I
haven't been able to try your patches, but a simplified version of the
above shows that gcc doesn't use the same stack slots for the various
anonymous variables.

* It's unreadable, despite the comment and the commit log.

I suggest using the more obvious approach of declaring a union on the
stack and pass the address of the appropriate member.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/