Re: [PATCH 01/10] nbd: Fix timeout detection

From: Ben Hutchings
Date: Sun Sep 27 2015 - 20:28:00 EST


On Mon, 2015-08-17 at 08:20 +0200, Markus Pargmann wrote:
> At the moment the nbd timeout just detects hanging tcp operations. This
> is not enough to detect a hanging or bad connection as expected of a
> timeout.
>
> This patch redesigns the timeout detection to include some more cases.
> The timeout is now in relation to replies from the server. If the server
> does not send replies within the timeout the connection will be shut
> down.
>
> The patch adds a continous timer 'timeout_timer' that is setup in one of
> two cases:
> - The request list is empty and we are sending the first request out to
> the server. We want to have a reply within the given timeout,
> otherwise we consider the connection to be dead.
> - A server response was received. This means the server is still
> communicating with us. The timer is reset to the timeout value.
>
> The timer is not stopped if the list becomes empty. It will just trigger
> a timeout which will directly leave the handling routine again as the
> request list is empty.
>
> The whole patch does not use any additional explicit locking. The
> list_empty() calls are safe to be used concurrently. The timer is locked
> internally as we just use mod_timer and del_timer_sync().

This is crazy. The timer is locked internally but the tasks are not.
So it is possible for the timeout handler to kill a task after it
exited from nbd_do_it()/nbd_thread_recv(), or after it exited entirely
(use-after-free).

[...]
> +> > task = READ_ONCE(nbd->task_send);
> +> > if (task)
> +> > > force_sig(SIGKILL, nbd->task_send);
[...]

And this is just... what? What is the point of using READ_ONCE() if
you're going to look up nbd->task_send again?

Ben.

--
Ben Hutchings
All extremists should be taken out and shot.

Attachment: signature.asc
Description: This is a digitally signed message part