Re: [syzbot] [net?] INFO: task hung in tls_sw_sendpage (3)

From: Eric Dumazet
Date: Tue Feb 28 2023 - 06:25:20 EST


On Tue, Feb 28, 2023 at 12:53 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Mon, 27 Feb 2023 21:35:41 +0100 Eric Dumazet wrote:
> > This looks suspicious to me
> >
> > commit 79ffe6087e9145d2377385cac48d0d6a6b4225a5
> > Author: Jakub Kicinski <kuba@xxxxxxxxxx>
> > Date: Tue Nov 5 14:24:35 2019 -0800
> >
> > net/tls: add a TX lock
> >
> >
> > If tls_sw_sendpage() has to call sk_stream_wait_memory(),
> > sk_stream_wait_memory() is properly releasing the socket lock,
> > but knows nothing about mutex_{un}lock(&tls_ctx->tx_lock);
>
> That's supposed to be the point of the lock, prevent new writers from
> messing with the partially pushed records when the original writer
> is waiting for write space.
>
> Obvious hack but the async crypto support makes TLS a bit of a mess :|
>
> sendpage_lock not taking tx_lock may lead to obvious problems, I'm not
> seeing where the deadlock is, tho..
>

This report mentions sendpage, but sendmsg() would have the same issue.

A thread might be blocked in sk_stream_wait_memory() with the mutex
held, for an arbitrary amount of time,
say if the remote peer stays in RWIN 0 for hours.

This prevents tx_work from making progress, and
tls_sw_cancel_work_tx() would be stuck forever.

The consensus is that the kernel shouts a warning if a thread has been
waiting on a mutex
more than 120 seconds (check_hung_uninterruptible_tasks())