Re: [lkp-robot] [x86/refcount] b631e535c6: WARNING:at_net/netlink/af_netlink.c:#netlink_sock_destruct

From: Hans Liljestrand
Date: Thu Jul 27 2017 - 09:34:37 EST


On Tue, Jul 25, 2017 at 11:38:30AM -0700, Kees Cook wrote:
On Tue, Jul 25, 2017 at 3:43 AM, Hans Liljestrand
<liljestrandh@xxxxxxxxx> wrote:
On Mon, Jul 24, 2017 at 08:21:16PM -0700, Kees Cook wrote:

On Mon, Jul 24, 2017 at 6:03 AM, Hans Liljestrand
<liljestrandh@xxxxxxxxx> wrote:

On Sun, Jul 23, 2017 at 08:52:53PM -0700, Kees Cook wrote:


Is 14afee4b6092f ("net: convert sock.sk_wmem_alloc from atomic_t to
refcount_t") correct? That looks like a statistics counter, not a
refcounter? I can't quite tell, though...



Hmm, yes, it looks a bit weird, but it is used in a refcount fashion
here:

void sk_free(struct sock *sk)
{
/*
* We subtract one from sk_wmem_alloc and can know if
* some packets are still in some tx queue.
* If not null, sock_wfree() will call __sk_free(sk) later
*/
if (refcount_dec_and_test(&sk->sk_wmem_alloc))
__sk_free(sk);
}

http://elixir.free-electrons.com/linux/v4.13-rc1/source/net/core/sock.c#L1605


Ah yeah, there it is. Hrmpf. Something is triggering WARNs, though...
I wonder if this can get examined more closely?


I tried reproducing the error but I don't seem to know how to use lkp. Got
lots of permission denied errors and finally ran out of disk space (after
using up ~50GB).

Maybe I did something wrong?

What I did was: Cloned the related kernel repository, checked out offending
commit, plopped in config, compiled bzImage. Then I just cloned the lkp repo
and tried running the provided command line with the bzImage and provided
script.

I'll take another look once I have the time, might be I missed something
earlier.

Yeah, I'm not sure. Seems it was found through trinity? And only after
36 seconds, too.

I think I might have missed something here? I cannot find anything about trinity or 36 seconds? Although I either misplaced or didn't get the original email, so I'm not sure if it had some other attachments beyond the config and script?


Also, why not atomic->refcount for sk_rmem_alloc?

I couldn't find any similar refcount-like use on sk_rmem_alloc.

Okay, interesting.

And as noted the sk_wmem_alloc thing is also a bit dubious. It looks like it
serves a dual purpose of actual allocation size and occasional reference
counter.

Could you ask net-dev to see what is actually happening here? This
looks like a regression, but also very odd (broken?) refcounting ...

Sure, but I'm unsure of what exactly I should be asking? If you have any more information on the trinity results I'd be happy to look at that beforehand?

Thanks,
-hans


-Kees


--
Kees Cook
Pixel Security