Re: Re: [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close()

From: duoming
Date: Thu Apr 07 2022 - 07:06:48 EST


Hello,

On Thu, 7 Apr 2022 00:21:58 -0700 Max Filippov wrote:

> > There is a deadlock in rs_close(), which is shown
> > below:
> >
> > (Thread 1) | (Thread 2)
> > | rs_open()
> > rs_close() | mod_timer()
> > spin_lock_bh() //(1) | (wait a time)
> > ... | rs_poll()
> > del_timer_sync() | spin_lock() //(2)
> > (wait timer to stop) | ...
> >
> > We hold timer_lock in position (1) of thread 1 and
> > use del_timer_sync() to wait timer to stop, but timer handler
> > also need timer_lock in position (2) of thread 2.
> > As a result, rs_close() will block forever.
>
> I agree with this.
>
> > This patch extracts del_timer_sync() from the protection of
> > spin_lock_bh(), which could let timer handler to obtain
> > the needed lock.
>
> Looking at the timer_lock I don't really understand what it protects.
> It looks like it is not needed at all.

There is no race condition between rs_close and rs_poll(timer handler),
I think we could remove the timer_lock in rs_close(), rs_open() and rs_poll().

> Also, I see that rs_poll rewinds the timer regardless of whether del_timer_sync
> was called or not, which violates del_timer_sync requirements.

I wrote a kernel module to test whether del_timer_sync() could finish a timer handler
that use mod_timer() to rewind itself. The following is the result.

# insmod del_timer_sync.ko
[ 929.374405] my_timer will be create.
[ 929.374738] the jiffies is :4295595572
[ 930.411581] In my_timer_function
[ 930.411956] the jiffies is 4295596609
[ 935.466643] In my_timer_function
[ 935.467505] the jiffies is 4295601665
[ 940.586538] In my_timer_function
[ 940.586916] the jiffies is 4295606784
[ 945.706579] In my_timer_function
[ 945.706885] the jiffies is 4295611904

#
# rmmod del_timer_sync.ko
[ 948.507692] the del_timer_sync is :1
[ 948.507692]
#
#

The result of the experiment shows that the timer handler could
be killed after we execute del_timer_sync().

> > Signed-off-by: Duoming Zhou <duoming@xxxxxxxxxx>
> > ---
> > arch/xtensa/platforms/iss/console.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/xtensa/platforms/iss/console.c b/arch/xtensa/platforms/iss/console.c
> > index 81d7c7e8f7e..d431b61ae3c 100644
> > --- a/arch/xtensa/platforms/iss/console.c
> > +++ b/arch/xtensa/platforms/iss/console.c
> > @@ -51,8 +51,10 @@ static int rs_open(struct tty_struct *tty, struct file * filp)
> > static void rs_close(struct tty_struct *tty, struct file * filp)
> > {
> > spin_lock_bh(&timer_lock);
> > - if (tty->count == 1)
> > + if (tty->count == 1) {
> > + spin_unlock_bh(&timer_lock);
> > del_timer_sync(&serial_timer);
> > + }
> > spin_unlock_bh(&timer_lock);
>
> Now in case tty->count == 1 the timer_lock would be unlocked twice.

I will remove the timer_lock in rs_close(), rs_open() and rs_poll().

Thanks a lot for your time and advice!

Best regards,
Duoming Zhou