Re: [PATCH net v2] NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx

From: Krzysztof Kozlowski
Date: Tue May 17 2022 - 07:42:51 EST


On 17/05/2022 12:55, Duoming Zhou wrote:
> There are sleep in atomic context bugs when the request to secure
> element of st21nfca is timeout. The root cause is that kzalloc and
> alloc_skb with GFP_KERNEL parameter and mutex_lock are called in
> st21nfca_se_wt_timeout which is a timer handler. The call tree shows
> the execution paths that could lead to bugs:
>
> (Interrupt context)
> st21nfca_se_wt_timeout
> nfc_hci_send_event
> nfc_hci_hcp_message_tx
> kzalloc(..., GFP_KERNEL) //may sleep
> alloc_skb(..., GFP_KERNEL) //may sleep
> mutex_lock() //may sleep
>
> This patch changes allocation mode of kzalloc and alloc_skb from
> GFP_KERNEL to GFP_ATOMIC and changes mutex_lock to spin_lock in
> order to prevent atomic context from sleeping.
>
> Fixes: 2130fb97fecf ("NFC: st21nfca: Adding support for secure element")
> Signed-off-by: Duoming Zhou <duoming@xxxxxxxxxx>
> ---
> Changes in v2:
> - Change mutex_lock to spin_lock.
>
> include/net/nfc/hci.h | 3 ++-
> net/nfc/hci/core.c | 18 +++++++++---------
> net/nfc/hci/hcp.c | 10 +++++-----
> 3 files changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/include/net/nfc/hci.h b/include/net/nfc/hci.h
> index 756c11084f6..8f66e6e6b91 100644
> --- a/include/net/nfc/hci.h
> +++ b/include/net/nfc/hci.h
> @@ -103,7 +103,8 @@ struct nfc_hci_dev {
>
> bool shutting_down;
>
> - struct mutex msg_tx_mutex;
> + /* The spinlock is used to protect resources related with hci message TX */
> + spinlock_t msg_tx_spin;
>
> struct list_head msg_tx_queue;
>
> diff --git a/net/nfc/hci/core.c b/net/nfc/hci/core.c
> index ceb87db57cd..fa22f9fe5fc 100644
> --- a/net/nfc/hci/core.c
> +++ b/net/nfc/hci/core.c
> @@ -68,7 +68,7 @@ static void nfc_hci_msg_tx_work(struct work_struct *work)
> struct sk_buff *skb;
> int r = 0;
>
> - mutex_lock(&hdev->msg_tx_mutex);
> + spin_lock(&hdev->msg_tx_spin);
> if (hdev->shutting_down)
> goto exit;

How did you test your patch?

Did you check, really check, that this can be an atomic (non-sleeping)
section?

I have doubts because I found at least one path leading to device_lock
(which is a mutex) called within your new code.

Before sending a new version, please wait for discussion to reach some
consensus. The quality of these fixes is really poor. :(

Best regards,
Krzysztof