Re: CONFIG_PREEMPT_RT local_softirq_pending warning when ISR blocks

From: Brian Silverman
Date: Mon Mar 09 2015 - 20:37:09 EST


On Mon, Mar 9, 2015 at 12:08 PM, Sebastian Andrzej Siewior
<bigeasy@xxxxxxxxxxxxx> wrote:
> * Brian Silverman | 2015-03-05 00:16:20 [-0500]:
>
>>Beforehand, 000 is spending most of its time in interrupts, but bash
>>is doing something related to memory management on it in between.
>> bash-14721 [000] ......1 6854.629126: rt_spin_lock <-free_pcppages_bulk
>> bash-14721 [000] ....1.1 6854.629126: mm_page_pcpu_drain: page=ffffea000fa1aa40 pfn=4097705 order=0 migratetype=1
>> bash-14721 [000] ......1 6854.629127: get_pfnblock_flags_mask <-free_pcppages_bulk
>> bash-14721 [000] ......1 6854.629127: __mod_zone_page_state <-free_pcppages_bulk
>> bash-14721 [000] ....1.1 6854.629127: mm_page_pcpu_drain: page=ffffea000f572ac0 pfn=4021419 order=0 migratetype=0
>> bash-14721 [000] ......1 6854.629128: get_pfnblock_flags_mask <-free_pcppages_bulk
>> bash-14721 [000] ......1 6854.629128: __mod_zone_page_state <-free_pcppages_bulk
>>... # lots more virtually identical repetitions of those last 3 lines
>> bash-14721 [000] ....1.1 6854.629139: mm_page_pcpu_drain: page=ffffea000f481a80 pfn=4005994 order=0 migratetype=1
>> bash-14721 [000] ......1 6854.629139: get_pfnblock_flags_mask <-free_pcppages_bulk
> You free memory and hold the zone->lock
>
>> bash-14721 [000] d.....1 6854.629139: do_IRQ <-ret_from_intr
>> bash-14721 [000] d.....1 6854.629139: irq_enter <-do_IRQ
>>... # wakes up the can1 ISR thread on 001 and the can0 one on 000
>>(same physical IRQ line)
>> bash-14721 [000] d...3.1 6854.629261: sched_switch: prev_comm=bash prev_pid=14721 prev_prio=120 prev_state=R+ ==> next_comm=irq/18-can0 next_pid=2015 next_prio=28
>
> I would assume that this one raises NET_RX softirq. But at the bottom
> you have the irq handler on the other CPU which confuses meâ

There wasn't actually any traffic on can0 for this test, so it didn't.
The can0 ISR only makes a few reads/writes to the device, never
calling netif_rx.

The can1 handler (which actually raises a NET_RX softirq) runs on 001
because it's pinned there.

>
>>... # runs the can0 ISR
>> irq/18-can0-2015 [000] d...3.. 6854.629283: sched_switch: prev_comm=irq/18-can0 prev_pid=2015 prev_prio=28 prev_state=S ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=98
>>...
>> ksoftirqd/0-3 [000] ....1.1 6854.629291: softirq_entry: vec=1 [action=TIMER]
>>...
>> ksoftirqd/0-3 [000] ....1.1 6854.629293: softirq_exit: vec=1 [action=TIMER]
> only the timer since nobody raised NET_RX

Correct. I included that as context for what 000 spent its time doing
with the lock held, not implying that it's incorrect.

>
>>...
>> ksoftirqd/0-3 [000] ....... 6854.629298: schedule <-smpboot_thread_fn ...
>> ksoftirqd/0-3 [000] d...3.. 6854.629304: sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=98 prev_state=S ==> next_comm=bash next_pid=14721 next_prio=28
>>...
>> bash-14721 [000] d...1.1 6854.629308: smp_trace_reschedule_interrupt <-trace_reschedule_interrupt
>># Actually unnecessary schedule IPI from 001?
>> bash-14721 [000] d...1.1 6854.629309: irq_enter <-smp_trace_reschedule_interrupt
>>...
>> bash-14721 [000] ....1.1 6854.629316: __tick_nohz_task_switch <-__schedule
>> bash-14721 [000] ......1 6854.629316: __mod_zone_page_state <-free_pcppages_bulk
>> bash-14721 [000] ....1.1 6854.629317: mm_page_pcpu_drain: page=ffffea000f57a900 pfn=4021924 order=0 migratetype=0
>> bash-14721 [000] ......1 6854.629317: get_pfnblock_flags_mask <-free_pcppages_bulk
>> bash-14721 [000] ......1 6854.629317: __mod_zone_page_state <-free_pcppages_bulk
> and it continues cleaning up memory.
>
>>... # more of this like it was doing before
>>I don't see it unlocking the problematic mutex before the trace ends.
>>
>>The relevant part for 001 starts while it's running the can1 ISR.
>> irq/18-can1-7228 [001] ....1.1 6854.629275: netif_rx: dev=can1 skbaddr=ffff880412d8fc00 len=16
>> irq/18-can1-7228 [001] ......1 6854.629275: migrate_disable <-netif_rx_internal
>> irq/18-can1-7228 [001] ......2 6854.629275: enqueue_to_backlog <-netif_rx_internal
>
> enqueue_to_backlog() looks like packet reception so this should be
> handled in napi so I assume we run in NET_RX softirq

enqueue_to_backlog is the napi function that actually raises the NET_RX softirq.

>> irq/18-can1-7228 [001] d.....2 6854.629276: _raw_spin_lock <-enqueue_to_backlog
>> irq/18-can1-7228 [001] d...1.2 6854.629276: __raise_softirq_irqoff <-enqueue_to_backlog
>> irq/18-can1-7228 [001] d...1.2 6854.629276: do_raise_softirq_irqoff <-__raise_softirq_irqoff
>> irq/18-can1-7228 [001] d...2.2 6854.629276: softirq_raise: vec=3 [action=NET_RX]
This is where the softirq gets raised.

>>... # continues handling the can1 interrupt
>> irq/18-can1-7228 [001] ......6 6854.629286: rt_spin_lock <-get_page_from_freelist
>> irq/18-can1-7228 [001] ......6 6854.629287: rt_spin_lock_slowlock <-get_page_from_freelist
>> irq/18-can1-7228 [001] ......6 6854.629287: _raw_spin_lock <-rt_spin_lock_slowlock
>> irq/18-can1-7228 [001] ....1.6 6854.629287: __try_to_take_rt_mutex <-rt_spin_lock_slowlock
>> irq/18-can1-7228 [001] ....1.6 6854.629287: _raw_spin_lock_irq <-rt_spin_lock_slowlock
>> irq/18-can1-7228 [001] d...2.6 6854.629288: _raw_spin_unlock_irq <-rt_spin_lock_slowlock
>> irq/18-can1-7228 [001] ....1.6 6854.629288: task_blocks_on_rt_mutex <-rt_spin_lock_slowlock
>
> it might be zone->lock it goes after. It boosts the bash process which
> seems to free memory so it would make sense.
>
>>... # rt_mutex/scheduling stuff
>> irq/18-can1-7228 [001] d...4.6 6854.629291: sched_pi_setprio: comm=bash pid=14721 oldprio=120 newprio=28
>>... # more scheduler stuff
>> irq/18-can1-7228 [001] d...3.6 6854.629299: native_smp_send_reschedule <-rt_mutex_setprio
>>... # more scheduler stuff
>> irq/18-can1-7228 [001] d...2.6 6854.629307: pick_next_task_fair <-__schedule
>> irq/18-can1-7228 [001] d...2.6 6854.629307: pick_next_task_stop <-__schedule
>> irq/18-can1-7228 [001] d...2.6 6854.629307: pick_next_task_dl <-__schedule
>> irq/18-can1-7228 [001] d...2.6 6854.629307: pick_next_task_rt <-__schedule
>> irq/18-can1-7228 [001] d...2.6 6854.629307: pick_next_task_fair <-__schedule
>> irq/18-can1-7228 [001] d...2.6 6854.629308: pick_next_task_idle <-__schedule
>> irq/18-can1-7228 [001] d...3.6 6854.629308: sched_switch: prev_comm=irq/18-can1 prev_pid=7228 prev_prio=28 prev_state=D ==>
>>next_comm=swapper/1 next_pid=0 next_prio=120
>>...
>> <idle>-0 [001] d...1.. 6854.629319: softirq_check_pending_idle <-tick_nohz_idle_enter
>>My tracing_off() call is in softirq_check_pending_idle, so that's it.
>
> It looks like your softirq for net_rx is getting a packet and then after
> raising NET_RX (again?) it blocks on a lock. In order to get this lock
> it boosts and schedules bash. It gets runable but on the other CPU. On
> CPU1 there is nothig going is nothing going and the only runable task is
> the idle thread. And this is probably where the warning is written
> because we go to idle while we should process a softirq instead.

That sounds like the issue. Doing the softirq instead of going idle in
this situation seems like it means calling thread_do_softirq() from
__schedule, but I don't know where the right place is. Can anybody
give me some help on where exactly to check for softirqs from?

Thanks,
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/