[PATCH] string: Improve the generic strlcpy() implementation

From: Ingo Molnar
Date: Mon Oct 05 2015 - 07:27:10 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, Sep 10, 2015 at 8:43 PM, Chris Metcalf <cmetcalf@xxxxxxxxxx> wrote:
> >
> > Please pull the following changes for 4.3 from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git strscpy
>
> So I finally pulled it. I like the patch, I like the new interface,
> but despite that I wasn't really sure if I wanted to pull it in - thus
> the long delay of me just seeing this in my list of pending pulls for
> almost a month, but never really getting to the point where I decided
> I want to commit to it.

Interesting. I noticed that strscpy() says this in its comments:

* In addition, the implementation is robust to the string changing out
* from underneath it, unlike the current strlcpy() implementation.

The strscpy() interface is very nice, but shouldn't we also fix this strlcpy()
unrobustness/race it refers to, in light of the 2000+ existing strlcpy() call
sites?

The comment does not spell out what the exact race is, but I can see only a single
race in the current generic strlcpy() implementation, which all architectures
except s390 uses:

size_t strlcpy(char *dest, const char *src, size_t size)
{
size_t ret = strlen(src);

if (size) {
size_t len = (ret >= size) ? size - 1 : ret;
memcpy(dest, src, len);
dest[len] = '\0';
}
return ret;
}

If another CPU or an interrupt changes the source string after the strlen(), but
before the copy is complete, and shortens the source string, then we copy over the
NUL byte of the source buffer - including fragments of earlier source string
tails. The target buffer will still be properly NUL terminated - but it will be a
shorter string than the returned 'ret' source buffer length. (despite there not
being truncation.)

The s390 arch implementation has the same race AFAICS.

This may cause bugs if the return code is subsequently used to assume that it is
equal to the destination string's length. (While in reality it's shorter.)

The race is not automatically lethal, because it's guaranteed that the returned
length is indeed zero-delimited (due to the overlong copy we did) - so if the
string is memcpy()-ed, then it will still result in a weirdly padded but valid
string.

But if any subsequent use of the return code relies on the return code being equal
to a subsequent call of strlen(dest), then that use might lead to bugs. I.e. our
implementation of strlcpy() is indeed racy and unrobust.

But we could fix this race: by iterating over the string in a single go and
determining the length and copying the string at once. Like the new strscpy() code
does it, but with strlcpy() semantics.

This will also make strlcpy() faster, FWIMBW.

I also noticed another problem with our current generic strlcpy() implementation:
AFAICS it will also happily do bad stuff if we pass it a negative size. Instead of
that we should print a warning and return safely.

I've implemented all that, see the patch below.

(Only very lightly tested so far and no benchmarking done.)

Thanks,

Ingo

===================================>