Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
From: Thomas Gleixner
Date: Sat Jul 26 2025 - 08:00:12 EST
On Sat, Jul 26 2025 at 09:59, Greg Kroah-Hartman wrote:
> On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
>> static void __usb_hcd_giveback_urb(struct urb *urb)
>> {
>> (...snipped...)
>> kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) {
>> if (in_serving_softirq()) {
>> local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y
>> kcov_remote_start_usb(id) {
>> kcov_remote_start(id) {
>> kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) {
>> (...snipped...)
>> local_lock_irqsave(&kcov_percpu_data.lock, flags) {
>> __local_lock_irqsave(lock, flags) {
>> #ifndef CONFIG_PREEMPT_RT
>> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L125
>> #else
>> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L235 // not calling local_irq_save(flags)
>> #endif
Right, it does not invoke local_irq_save(flags), but it takes the
underlying lock, which means it prevents reentrance.
> Ok, but then how does the big comment section for
> kcov_remote_start_usb_softirq() work, where it explicitly states:
>
> * 2. Disables interrupts for the duration of the coverage collection section.
> * This allows avoiding nested remote coverage collection sections in the
> * softirq context (a softirq might occur during the execution of a work in
> * the BH workqueue, which runs with in_serving_softirq() > 0).
> * For example, usb_giveback_urb_bh() runs in the BH workqueue with
> * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
> * the middle of its remote coverage collection section, and the interrupt
> * handler might invoke __usb_hcd_giveback_urb() again.
>
>
> You are removing half of this function entirely, which feels very wrong
> to me as any sort of solution, as you have just said that all of that
> documentation entry is now not needed.
I'm not so sure because kcov_percpu_data.lock is only held within
kcov_remote_start() and kcov_remote_stop(), but the above comment
suggests that the whole section needs to be serialized.
Though I'm not a KCOV wizard and might be completely wrong here.
If the whole section is required to be serialized, then this need
another local lock in kcov_percpu_data to work correctly on RT.
Thanks,
tglx