Re: [PATCH] tty: vt: make do_con_write() no-op if IRQ is disabled

From: Tetsuo Handa
Date: Fri Dec 03 2021 - 07:32:39 EST


On 2021/12/03 20:00, Fabio M. De Francesco wrote:
> On Thursday, December 2, 2021 7:35:16 PM CET Linus Torvalds wrote:
>> On Thu, Dec 2, 2021 at 7:41 AM Tetsuo Handa
>> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>>> Looking at the backtrace, I see
>>>>
>>>> n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
>>>> tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
>>>> __start_tty drivers/tty/tty_io.c:806 [inline]
>>>> __start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
>>>>
>>>> and apparently it's that hdlc line discipline (and
>>>> n_hdlc_send_frames() in particular) that is the problem here.
>>>>
>>>> I think that's where the fix should be.
>>>
>>> Do you mean that we should change the behavior of n_hdlc_send_frames()
>>> rather than trying to make __start_tty() schedulable again?
>>
>> I wouldn't change n_hdlc_send_frames() itself. It does what it says it does.
>>
>> But n_hdlc_tty_wakeup() probably shouldn't call it directly. Other tty
>> line disciplines don't do that kind of thing - although I only looked
>> at a couple. They all seem to just set bits and prepare things. Like a
>> wakeup function should do.
>>
>> So I think n_hdlc_tty_wakeup() should perhaps only do a
>> "schedule_work()" or similar to get that n_hdlc_send_frames() started,
>> rather than doing it itself.
>>
>> Example: net/nfc/nci/uart.c. It does that
>>
>> schedule_work(&nu->write_work);
>>
>> instead of actually trying to do a write from a wakeup routine
>> (similar examples in ppp - "tasklet_schedule(&ap->tsk)" etc).
>>
>> I mean, it's called "wakeup", not "write". So I think the fundamental
>> confusion here is in hdlc, not the tty layer.
>>
>> Linus
>>

OK.

> This is what I understand from the above argument: do a schedule_work() to get
> that n_hdlc_send_frames() started; in this way, n_hdlc_tty_wakeup() can
> return to the caller and n_hdlc_send_frames() is executed asynchronously
> (i.e., no longer in an atomic context).

Yes. If we copy how net/nfc/nci/uart.c does, the fix would look like:

--------------------
drivers/tty/n_hdlc.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index 7e0884ecc74f..a71fcac60925 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -140,6 +140,8 @@ struct n_hdlc {
struct n_hdlc_buf_list rx_buf_list;
struct n_hdlc_buf_list tx_free_buf_list;
struct n_hdlc_buf_list rx_free_buf_list;
+ struct work_struct write_work;
+ struct tty_struct *tty_for_write_work;
};

/*
@@ -210,6 +212,8 @@ static void n_hdlc_tty_close(struct tty_struct *tty)
wake_up_interruptible(&tty->read_wait);
wake_up_interruptible(&tty->write_wait);

+ cancel_work_sync(&n_hdlc->write_work);
+
n_hdlc_free_buf_list(&n_hdlc->rx_free_buf_list);
n_hdlc_free_buf_list(&n_hdlc->tx_free_buf_list);
n_hdlc_free_buf_list(&n_hdlc->rx_buf_list);
@@ -334,6 +338,20 @@ static void n_hdlc_send_frames(struct n_hdlc *n_hdlc, struct tty_struct *tty)
goto check_again;
} /* end of n_hdlc_send_frames() */

+/**
+ * n_hdlc_tty_write_work - Asynchronous callback for transmit wakeup
+ * @work: pointer to work_struct
+ *
+ * Called when low level device driver can accept more send data.
+ */
+static void n_hdlc_tty_write_work(struct work_struct *work)
+{
+ struct n_hdlc *n_hdlc = container_of(work, struct n_hdlc, write_work);
+ struct tty_struct *tty = n_hdlc->tty_for_write_work;
+
+ n_hdlc_send_frames(n_hdlc, tty);
+} /* end of n_hdlc_tty_write_work() */
+
/**
* n_hdlc_tty_wakeup - Callback for transmit wakeup
* @tty: pointer to associated tty instance data
@@ -344,7 +362,8 @@ static void n_hdlc_tty_wakeup(struct tty_struct *tty)
{
struct n_hdlc *n_hdlc = tty->disc_data;

- n_hdlc_send_frames(n_hdlc, tty);
+ n_hdlc->tty_for_write_work = tty;
+ schedule_work(&n_hdlc->write_work);
} /* end of n_hdlc_tty_wakeup() */

/**
@@ -706,6 +725,7 @@ static struct n_hdlc *n_hdlc_alloc(void)
if (!n_hdlc)
return NULL;

+ INIT_WORK(&n_hdlc->write_work, n_hdlc_tty_write_work);
spin_lock_init(&n_hdlc->rx_free_buf_list.spinlock);
spin_lock_init(&n_hdlc->tx_free_buf_list.spinlock);
spin_lock_init(&n_hdlc->rx_buf_list.spinlock);
--------------------

>
> I hope that I'm not missing something. If the above summary is correct,
> please forgive a newbie for the following questions...
>
> Commit f9e053dcfc02 ("tty: Serialize tty flow control changes with flow_lock")
> has introduced spinlocks to serialize flow control changes and avoid the
> concurrent executions of __start_tty() and __stop_tty().
>
> This is an excerpt from the above-mentioned commit:
>
> ->
> Introduce tty->flow_lock spinlock to serialize tty flow control changes.
> Split out unlocked __start_tty()/__stop_tty() flavors for use by
> ioctl(TCXONC) in follow-on patch.
> <-
>
> This is the reason why we are dealing with this bug. Currently we have __start_tty()
> called with an acquired spinlock and IRQs disabled and the calls chain leads to
> console_lock() while in atomic context.

If we hit a race window described in that commit

CPU 0 | CPU 1
stop_tty() |
lock ctrl_lock |
tty->stopped = 1 |
unlock ctrl_lock |
| start_tty()
| lock ctrl_lock
| tty->stopped = 0
| unlock ctrl_lock
| driver->start()
driver->stop() |

In this case, the flow control state now indicates the tty has
been started, but the actual hardware state has actually been stopped.

, the tty->stopped flag remains 0 despite driver->stop() is called after
driver->start() finished. tty->stopped (the flow control state) says "not stopped"
but the actual hardware state is "stopped".

>
> In summation, my questions are...
>
> 1) Why do we still need to protect __start_tty() and __stop_tty() with spin_lock_irq()
> if the solution to the bug is to execute n_hdlc_send_frames() asynchronously?

Without serialization, tty->stopped flag and the actual hardware state can mismatch.

>
> 2) If it is true that we need to avoid concurrent executions of __start_tty() and
> __stop_tty(), can we just use a Mutex in the IOCTL's helper?

Yes if all __start_tty() and __stop_tty() callers were schedulable context.
But stop_tty() says that stop_tty() might be called from atomic context.
Thus, we can't use a mutex for protecting tty->stopped flag.

>
> Thanks,
>
> Fabio M. De Francesco

By the way, even with above patch, I think

CPU 0 | CPU 1 | CPU 2
stop_tty() | |
lock flow.lock | |
tty->stopped = 1 | |
driver->stop() | |
unlock flow.lock | |
| start_tty() |
| lock flow.lock |
| tty->stopped = 0 |
| driver->start() => Schedules n_hdlc_send_frames()
| unlock flow.lock |
stop_tty() | |
lock flow.lock | |
tty->stopped = 1 | |
driver->stop() | |
unlock flow.lock | |
| | Starts n_hdlc_send_frames()

(that is, the n_hdlc is writing to consoles despite tty->stopped is 1) can happen
until n_hdlc_send_frames() completes.

Then, even scheduling next n_hdlc_send_frames() while previous n_hdlc_send_frames() is
possible? In the worst case, multiple CPUs can run n_hdlc_send_frames() concurrently?

CPU 0 | CPU 1 | CPU 2 | CPU 3
stop_tty() | | |
lock flow.lock | | |
tty->stopped = 1 | | |
driver->stop() | | |
unlock flow.lock | | |
| start_tty() | |
| lock flow.lock | |
| tty->stopped = 0 | |
| driver->start() => Schedules n_hdlc_send_frames()
| unlock flow.lock | |
| | Starts n_hdlc_send_frames()
stop_tty() | | |
lock flow.lock | | |
tty->stopped = 1 | | |
driver->stop() | | |
unlock flow.lock | | |
| start_tty() | |
| lock flow.lock | |
| tty->stopped = 0 | |
| driver->start() => Schedules n_hdlc_send_frames()
| unlock flow.lock | |
| | | Starts n_hdlc_send_frames()

Ah, OK. n_hdlc->tbusy is there for serialization.