Re: the qemu-nbd process automatically exit with the commit 43347d56c 'livepatch: send a fake signal to all blocking tasks'

From: Miroslav Benes
Date: Thu Apr 15 2021 - 04:37:57 EST


On Wed, 14 Apr 2021, Josef Bacik wrote:

> On 4/14/21 11:21 AM, xiaojun.zhao141@xxxxxxxxx wrote:
> > On Wed, 14 Apr 2021 13:27:43 +0200 (CEST)
> > Miroslav Benes <mbenes@xxxxxxx> wrote:
> >
> >> Hi,
> >>
> >> On Wed, 14 Apr 2021, xiaojun.zhao141@xxxxxxxxx wrote:
> >>
> >>> I found the qemu-nbd process(started with qemu-nbd -t -c /dev/nbd0
> >>> nbd.qcow2) will automatically exit when I patched for functions of
> >>> the nbd with livepatch.
> >>>
> >>> The nbd relative source:
> >>> static int nbd_start_device_ioctl(struct nbd_device *nbd, struct
> >>> block_device *bdev)
> >>> { struct nbd_config *config =
> >>> nbd->config; int
> >>> ret;
> >>> ret =
> >>> nbd_start_device(nbd); if
> >>> (ret) return
> >>> ret;
> >>> if
> >>> (max_part) bdev->bd_invalidated =
> >>> 1;
> >>> mutex_unlock(&nbd->config_lock); ret =
> >>> wait_event_interruptible(config->recv_wq,
> >>> atomic_read(&config->recv_threads) == 0); if
> >>> (ret)
> >>> sock_shutdown(nbd);
> >>> flush_workqueue(nbd->recv_workq);
> >>> mutex_lock(&nbd->config_lock);
> >>> nbd_bdev_reset(bdev);
> >>> /* user requested, ignore socket errors
> >>> */ if (test_bit(NBD_RT_DISCONNECT_REQUESTED,
> >>> &config->runtime_flags)) ret =
> >>> 0; if (test_bit(NBD_RT_TIMEDOUT,
> >>> &config->runtime_flags)) ret =
> >>> -ETIMEDOUT; return
> >>> ret; }
> >>
> >> So my understanding is that ndb spawns a number
> >> (config->recv_threads) of workqueue jobs and then waits for them to
> >> finish. It waits interruptedly. Now, any signal would make
> >> wait_event_interruptible() to return -ERESTARTSYS. Livepatch fake
> >> signal is no exception there. The error is then propagated back to
> >> the userspace. Unless a user requested a disconnection or there is
> >> timeout set. How does the userspace then reacts to it? Is
> >> _interruptible there because the userspace sends a signal in case of
> >> NBD_RT_DISCONNECT_REQUESTED set? How does the userspace handles
> >> ordinary signals? This all sounds a bit strange, but I may be missing
> >> something easily.
> >>
> >>> When the nbd waits for atomic_read(&config->recv_threads) == 0, the
> >>> klp will send a fake signal to it then the qemu-nbd process exits.
> >>> And the signal of sysfs to control this action was removed in the
> >>> commit 10b3d52790e 'livepatch: Remove signal sysfs attribute'. Are
> >>> there other ways to control this action? How?
> >>
> >> No, there is no way currently. We send a fake signal automatically.
> >>
> >> Regards
> >> Miroslav
> > It occurs IO error of the nbd device when I use livepatch of the
> > nbd, and I guess that any livepatch on other kernel source maybe cause
> > the IO error. Well, now I decide to workaround for this problem by
> > adding a livepatch for the klp to disable a automatic fake signal.
> >
>
> Would wait_event_killable() fix this problem? I'm not sure any client
> implementations depend on being able to send other signals to the client
> process, so it should be safe from that standpoint. Not sure if the livepatch
> thing would still get an error at that point tho. Thanks,

wait_event_killable() means that you would sleep uninterruptedly (still
reacting to fatal signals), so the fake signal from livepatch would not be
sent at all. set_notify_signal() handles TASK_INTERRUPTIBLE tasks. No
disruption for the userspace and it would fix this problem.

There is a catch on the livepatch side of things. If there is a live patch
for nbd_start_device_ioctl(), the transition process would get stuck until
the task leaves the function (all workqueue jobs are processed). I gather
it is unlikely to be it indefinite, so we can live with that, I think.

Miroslav