Re: Bug 4.1.16: self-detected stall in net/unix/?

From: Philipp Hahn
Date: Fri Feb 05 2016 - 10:28:42 EST


Hello Hannes,

thank you for your previous reply.

Am 03.02.2016 um 02:43 schrieb Hannes Frederic Sowa:
> On 02.02.2016 17:25, Philipp Hahn wrote:
>> we recently updated our kernel to 4.1.16 + patch for "unix: properly
>> account for FDs passed over unix sockets" and have since then
>> self-detected stalls triggered by the Samba daemon:
...
>> We have not yet been able to reproduce the hang, but going back to our
>> previous kernel 4.1.12 makes the problem go away.
>
> Can you remove the patch "unix: properly account for FDs passed over
> unix sockets" and see if the problem still happens?

I will try.
The problem is that I can't trigger the bug reliably. It always happens
to "smbd", but I don't know the triggering condition.

> I couldn't quickly see any problems with your added patch. I currently
> suspect a tight loop because of a SOCK_DEAD flag set but the socket not
> removed from unix_socket_table or the vfs. Hmmm...

All occurrences of the bug show _raw_spin_lock() as the sleeping
function, never anything other thus far.
I have seen the bug both on amd64 and i386.

If it would spin in
> 1097 restart:
> 1098 ÂÂÂÂÂÂÂÂother = unix_find_other(net, sunaddr, alen, sock->type, hash, &err);
> 1099 ÂÂÂÂÂÂÂÂif (!other)
> 1100 ÂÂÂÂÂÂÂÂÂÂÂÂgoto out;
> 1101
> 1102 ÂÂÂÂÂÂÂÂunix_state_double_lock(sk, other);
> 1103
> 1104 ÂÂÂÂÂÂÂÂ/* Apparently VFS overslept socket death. Retry. */
> 1105 ÂÂÂÂÂÂÂÂif (sock_flag(other, SOCK_DEAD)) {
> 1106 ÂÂÂÂÂÂÂÂÂÂÂÂunix_state_double_unlock(sk, other);
> 1107 ÂÂÂÂÂÂÂÂÂÂÂÂsock_put(other);
> 1108 ÂÂÂÂÂÂÂÂÂÂÂÂgoto restart;
> 1109 ÂÂÂÂÂÂÂÂ}

I would expect some other function to show up in the trace, but it's
always the call chain "unix_dgram_connect -> unix_state_double_lock ->
_raw_spin_lock".

The only case I see is that unix_find_other() returns the same "other"
again and again.

Q: How will that dead "other" be garbage collected finally? The kernel is
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set


During my hunt I found the following difference between 4.4 and 4.1.16
in net/unix/af_unix.c:

> commit 1586a5877db9eee313379738d6581bc7c6ffb5e3
> Author: Eric Dumazet <edumazet@xxxxxxxxxx>
> Date: Fri Oct 23 10:59:16 2015 -0700
>
> af_unix: do not report POLLOUT on listeners
>
> poll(POLLOUT) on a listener should not report fd is ready for
> a write().
>
> This would break some applications using poll() and pfd.events = -1,
> as they would not block in poll()
>
> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Reported-by: Alan Burlison <Alan.Burlison@xxxxxxxxxx>
> Tested-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
...
> -static inline int unix_writable(struct sock *sk)
> +static int unix_writable(const struct sock *sk)
> {
> - return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
> + return sk->sk_state != TCP_LISTEN &&
> + (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
> }

Q: That *TCP*_LISTEN in net/*unix*/ looks strange to my eyes, but I'm
not a kernel guru (yet).

There is one hunk in 5c77e26862ce604edea05b3442ed765e9756fe0f, which
uses the result of that function, which might explain the difference,
why it shows with 4.1.16 but not with 4.4?

> @@ -2245,14 +2388,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
> writable = unix_writable(sk);
> - other = unix_peer_get(sk);
> - if (other) {
> - if (unix_peer(other) != sk) {
> - sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
> - if (unix_recvq_full(other))
> - writable = 0;
> - }
> - sock_put(other);
> + if (writable) {
> + unix_state_lock(sk);
> +
> + other = unix_peer(sk);
> + if (other && unix_peer(other) != sk &&
> + unix_recvq_full(other) &&
> + unix_dgram_peer_wake_me(sk, other))
> + writable = 0;
> +
> + unix_state_unlock(sk);
> }
> if (writable)


I will for now build a new kernel with
> $ git log --oneline v4.1.12..v4.1.17 -- net/unix
> dc6b0ec unix: properly account for FDs passed over unix sockets
> cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code
> 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue
reverted to see if it still happens. The "middle" patch seems harmless,
as it only changes a code path for STREAMS, while the bug triggers with
DGRAMS only.

> The stack trace is rather unreliable, maybe something completely
> different happend. Do you happend to see better reports?

So far they look all the same.
Anything more I can do to prepare for collection better information next
time I get that bug?

Thanks again.

Philipp