Re: [PATCH] string: Improve the generic strlcpy() implementation

From: Rasmus Villemoes
Date: Wed Oct 07 2015 - 05:04:48 EST


On Wed, Oct 07 2015, Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> We have procfs and sysfs as well, where format strings are indeed dominant, but
> are you sure this race exists in snprintf() in that form? I.e. can the return
> value of snprintf() be different from the true length of the output string, if the
> source string is modified in parallel?

Well, if truncation has happened the return value is different
(larger). But assuming the output buffer is large enough, the 'compute
strlen, then do copying, potentially copying a nul byte which wasn't
there moments before' is pretty obvious:

lib/vsprintf.c:

static noinline_for_stack
char *string(char *buf, char *end, const char *s, struct printf_spec spec)
{
int len, i;

if ((unsigned long)s < PAGE_SIZE)
s = "(null)";

len = strnlen(s, spec.precision);

if (!(spec.flags & LEFT)) {
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
}
for (i = 0; i < len; ++i) {
if (buf < end)
*buf = *s;
++buf; ++s;
}
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}

return buf;
}

(spec.precision is an s16 which by default is set to -1, so for the
usual case of plain %s the upper bound in strnlen is (size_t)-1,
effectively infinity). If it wasn't for the field width padding it would
probably not be that hard to fix.

But, to rephrase an earlier question: Can anyone point to an instance
where the strlcpy source or a %s argument to a printf function can
actually change under us? I'd like to see if one can intentionally
trigger the potential race, but I suspect that the vast majority cannot
have a problem - maybe someone has an idea of specific places that are
worth looking at.

>> > So I'd improve it all in the following order:
>> >
>> > - fix the strscpy() uninitialized use
>> >
>> > - base strlcpy() on strscpy() via the patch I sent. This makes all users faster
>> > and eliminates the theoretical race.
>>
>> I'm not so sure about that part. I'd strongly suspect that the vast
>> majority of strings handled by strlcpy (and in the future strscpy) are
>> shorter than 32 bytes, so is all the word_at_a_time and pre/post
>> alignment yoga really worth it?
>
> That's a good question, I'll measure it.

Here's a few pseudo-datapoints. About half the strlcpy instances (just
from lazy grepping) has a string literal as src, and those only have
about 1/8th chance of being aligned. A quick skim through a small
vmlinux showed 34 calls of strlcpy where %rsi was easily seen to be a
literal address, of which 7 were aligned. That's 1/5, but the sample
size is rather small (also, the 1/8 is admittedly an underestimate,
since gcc seems to put long enough literals in rodata.str1.8).

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/