Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT

From: Yunseong Kim
Date: Fri Aug 08 2025 - 13:36:10 EST


Hi Sebastian,

I was waiting for your review — thanks!

On 8/9/25 1:33 오전, Sebastian Andrzej Siewior wrote:
> On 2025-07-25 20:14:01 [+0000], Yunseong Kim wrote:
>> When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following
>> bug is triggered in the ksoftirqd context.
>>
>
>> This issue was introduced by commit
>> f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
>>
>> However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save()
>> call establishes an atomic context where sleeping is forbidden. Inside this
>> context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping
>> locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in
>> a sleeping function called from invalid context.
>>
>> On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario
>> is already safely handled by the existing local_lock_t and the global
>> kcov_remote_lock within kcov_remote_start(). Therefore, the outer
>> local_irq_save() is not necessary.
>>
>> This preserves the intended re-entrancy protection for non-RT kernels while
>> resolving the locking violation on PREEMPT_RT kernels.
>>
>> After making this modification and testing it, syzkaller fuzzing the
>> PREEMPT_RT kernel is now running without stopping on latest announced
>> Real-time Linux.
>
> This looks oddly familiar because I removed the irq-disable bits while
> adding local-locks.
>
> Commit f85d39dd7ed8 looks wrong not that it shouldn't disable
> interrupts. The statement in the added comment
>
> | + * 2. Disables interrupts for the duration of the coverage collection section.
> | + * This allows avoiding nested remote coverage collection sections in the
> | + * softirq context (a softirq might occur during the execution of a work in
> | + * the BH workqueue, which runs with in_serving_softirq() > 0).
>
> is wrong. Softirqs are never nesting. While the BH workqueue is
> running another softirq does not occur. The softirq is raised (again)
> and will be handled _after_ BH workqueue is done. So this is already
> serialised.
>
> The issue is __usb_hcd_giveback_urb() always invokes
> kcov_remote_start_usb_softirq(). __usb_hcd_giveback_urb() itself is
> invoked from BH context (for the majority of HCDs) and from hardirq
> context for the root-HUB. This gets us to the scenario that that we are
> in the give-back path in softirq context and then invoke the function
> once again in hardirq context.
>
> I have no idea how kcov works but reverting the original commit and
> avoiding the false nesting due to hardirq context should do the trick,
> an untested patch follows.
>
> This isn't any different than the tasklet handling that was used before
> so I am not sure why it is now a problem.

Thank you for the detailed analysis and the patch. Your explanation about
the real re-entrancy issue being "softirq vs. hardirq" and the faulty
premise in the original commit makes perfect sense.

> Could someone maybe test this?

As you requested, I have tested your patch on my setup.

I can check that your patch resolves the issue. I have been running
the syzkaller for several hours, and the "sleeping function called
from invalid context" bug is no longer triggered.

> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -1636,7 +1636,6 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
> struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus);
> struct usb_anchor *anchor = urb->anchor;
> int status = urb->unlinked;
> - unsigned long flags;
>
> urb->hcpriv = NULL;
> if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) &&
> @@ -1654,14 +1653,13 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
> /* pass ownership to the completion handler */
> urb->status = status;
> /*
> - * Only collect coverage in the softirq context and disable interrupts
> - * to avoid scenarios with nested remote coverage collection sections
> - * that KCOV does not support.
> - * See the comment next to kcov_remote_start_usb_softirq() for details.
> + * This function can be called in task context inside another remote
> + * coverage collection section, but kcov doesn't support that kind of
> + * recursion yet. Only collect coverage in softirq context for now.
> */
> - flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
> + kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
> urb->complete(urb);
> - kcov_remote_stop_softirq(flags);
> + kcov_remote_stop_softirq();
>
> usb_anchor_resume_wakeups(anchor);
> atomic_dec(&urb->use_count);
> diff --git a/include/linux/kcov.h b/include/linux/kcov.h
> index 75a2fb8b16c32..0143358874b07 100644
> --- a/include/linux/kcov.h
> +++ b/include/linux/kcov.h
> @@ -57,47 +57,21 @@ static inline void kcov_remote_start_usb(u64 id)
>
> /*
> * The softirq flavor of kcov_remote_*() functions is introduced as a temporary
> - * workaround for KCOV's lack of nested remote coverage sections support.
> - *
> - * Adding support is tracked in https://bugzilla.kernel.org/show_bug.cgi?id=210337.
> - *
> - * kcov_remote_start_usb_softirq():
> - *
> - * 1. Only collects coverage when called in the softirq context. This allows
> - * avoiding nested remote coverage collection sections in the task context.
> - * For example, USB/IP calls usb_hcd_giveback_urb() in the task context
> - * within an existing remote coverage collection section. Thus, KCOV should
> - * not attempt to start collecting coverage within the coverage collection
> - * section in __usb_hcd_giveback_urb() in this case.
> - *
> - * 2. Disables interrupts for the duration of the coverage collection section.
> - * This allows avoiding nested remote coverage collection sections in the
> - * softirq context (a softirq might occur during the execution of a work in
> - * the BH workqueue, which runs with in_serving_softirq() > 0).
> - * For example, usb_giveback_urb_bh() runs in the BH workqueue with
> - * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
> - * the middle of its remote coverage collection section, and the interrupt
> - * handler might invoke __usb_hcd_giveback_urb() again.
> + * work around for kcov's lack of nested remote coverage sections support in
> + * task context. Adding support for nested sections is tracked in:
> + * https://bugzilla.kernel.org/show_bug.cgi?id=210337
> */
>
> -static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
> +static inline void kcov_remote_start_usb_softirq(u64 id)
> {
> - unsigned long flags = 0;
> -
> - if (in_serving_softirq()) {
> - local_irq_save(flags);
> + if (in_serving_softirq() && !in_hardirq())
> kcov_remote_start_usb(id);
> - }
> -
> - return flags;
> }
>
> -static inline void kcov_remote_stop_softirq(unsigned long flags)
> +static inline void kcov_remote_stop_softirq(void)
> {
> - if (in_serving_softirq()) {
> + if (in_serving_softirq() && !in_hardirq())
> kcov_remote_stop();
> - local_irq_restore(flags);
> - }
> }
>
> #ifdef CONFIG_64BIT
> @@ -131,11 +105,8 @@ static inline u64 kcov_common_handle(void)
> }
> static inline void kcov_remote_start_common(u64 id) {}
> static inline void kcov_remote_start_usb(u64 id) {}
> -static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
> -{
> - return 0;
> -}
> -static inline void kcov_remote_stop_softirq(unsigned long flags) {}
> +static inline void kcov_remote_start_usb_softirq(u64 id) {}
> +static inline void kcov_remote_stop_softirq(void) {}
>
> #endif /* CONFIG_KCOV */
> #endif /* _LINUX_KCOV_H */


I really impressed your "How to Not Break PREEMPT_RT" talk at LPC 22.


Tested-by: Yunseong Kim <ysk@xxxxxxxxxxx>


Thanks,

Yunseong Kim