Re: [PATCH v1 bpf-next 05/11] tcp: Migrate TCP_NEW_SYN_RECV requests.

From: Eric Dumazet
Date: Tue Dec 01 2020 - 10:14:34 EST




On 12/1/20 3:44 PM, Kuniyuki Iwashima wrote:
> This patch renames reuseport_select_sock() to __reuseport_select_sock() and
> adds two wrapper function of it to pass the migration type defined in the
> previous commit.
>
> reuseport_select_sock : BPF_SK_REUSEPORT_MIGRATE_NO
> reuseport_select_migrated_sock : BPF_SK_REUSEPORT_MIGRATE_REQUEST
>
> As mentioned before, we have to select a new listener for TCP_NEW_SYN_RECV
> requests at receiving the final ACK or sending a SYN+ACK. Therefore, this
> patch also changes the code to call reuseport_select_migrated_sock() even
> if the listening socket is TCP_CLOSE. If we can pick out a listening socket
> from the reuseport group, we rewrite request_sock.rsk_listener and resume
> processing the request.
>
> Reviewed-by: Benjamin Herrenschmidt <benh@xxxxxxxxxx>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx>
> ---
> include/net/inet_connection_sock.h | 12 +++++++++++
> include/net/request_sock.h | 13 ++++++++++++
> include/net/sock_reuseport.h | 8 +++----
> net/core/sock_reuseport.c | 34 ++++++++++++++++++++++++------
> net/ipv4/inet_connection_sock.c | 13 ++++++++++--
> net/ipv4/tcp_ipv4.c | 9 ++++++--
> net/ipv6/tcp_ipv6.c | 9 ++++++--
> 7 files changed, 81 insertions(+), 17 deletions(-)
>
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index 2ea2d743f8fc..1e0958f5eb21 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -272,6 +272,18 @@ static inline void inet_csk_reqsk_queue_added(struct sock *sk)
> reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue);
> }
>
> +static inline void inet_csk_reqsk_queue_migrated(struct sock *sk,
> + struct sock *nsk,
> + struct request_sock *req)
> +{
> + reqsk_queue_migrated(&inet_csk(sk)->icsk_accept_queue,
> + &inet_csk(nsk)->icsk_accept_queue,
> + req);
> + sock_put(sk);
> + sock_hold(nsk);

This looks racy to me. nsk refcount might be zero at this point.

If you think it can _not_ be zero, please add a big comment here,
because this would mean something has been done before reaching this function,
and this sock_hold() would be not needed in the first place.

There is a good reason reqsk_alloc() is using refcount_inc_not_zero().

> + req->rsk_listener = nsk;
> +}
> +

Honestly, this patch series looks quite complex, and finding a bug in the
very first function I am looking at is not really a good sign...