Re: [PATCH] net/xfrm/xfrm_ipcomp: Use {get,put}_cpu_light

From: Daniel Bristot de Oliveira
Date: Wed Jul 17 2019 - 03:31:54 EST




On 17/07/2019 09:20, Juri Lelli wrote:
> The following BUG has been reported while running ipsec tests.
>
> BUG: scheduling while atomic: irq/78-eno3-rx-/12023/0x00000002
> Modules linked in: ipcomp xfrm_ipcomp ...
> Preemption disabled at:
> [<ffffffffc0b29730>] ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp]
> CPU: 1 PID: 12023 Comm: irq/78-eno3-rx- Kdump: loaded Not tainted [...] #1
> Hardware name: [...]
> Call Trace:
> dump_stack+0x5c/0x80
> ? ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp]
> __schedule_bug.cold.81+0x44/0x51
> __schedule+0x5bf/0x6a0
> schedule+0x39/0xd0
> rt_spin_lock_slowlock_locked+0x10e/0x2b0
> rt_spin_lock_slowlock+0x50/0x80
> get_page_from_freelist+0x609/0x1560
> ? zlib_updatewindow+0x5a/0xd0
> __alloc_pages_nodemask+0xd9/0x280
> ipcomp_input+0x299/0x9a0 [xfrm_ipcomp]
> xfrm_input+0x5e3/0x960
> xfrm4_ipcomp_rcv+0x34/0x50
> ip_local_deliver_finish+0x22d/0x250
> ip_local_deliver+0x6d/0x110
> ? ip_rcv_finish+0xac/0x480
> ip_rcv+0x28e/0x3f9
> ? packet_rcv+0x43/0x4c0
> __netif_receive_skb_core+0xb7c/0xd10
> ? inet_gro_receive+0x8e/0x2f0
> netif_receive_skb_internal+0x4a/0x160
> napi_gro_receive+0xee/0x110
> tg3_rx+0x2a8/0x810 [tg3]
> tg3_poll_work+0x3b3/0x830 [tg3]
> tg3_poll_msix+0x3b/0x170 [tg3]
> net_rx_action+0x1ff/0x470
> ? __switch_to_asm+0x41/0x70
> do_current_softirqs+0x223/0x3e0
> ? irq_thread_check_affinity+0x20/0x20
> __local_bh_enable+0x51/0x60
> irq_forced_thread_fn+0x5e/0x80
> ? irq_finalize_oneshot.part.45+0xf0/0xf0
> irq_thread+0x13d/0x1a0
> ? wake_threads_waitq+0x30/0x30
> kthread+0x112/0x130
> ? kthread_create_worker_on_cpu+0x70/0x70
> ret_from_fork+0x35/0x40
>
> The problem resides in the fact that get_cpu() called from ipcomp_input()
> disables preemption, and that triggers the scheduling while atomic BUG further
> down the callpath chain of get_page_from_freelist(), i.e.,
>
> ipcomp_input
> ipcomp_decompress
> <-- get_cpu()
> alloc_page(GFP_ATOMIC)
> alloc_pages(GFP_ATOMIC, 0)
> alloc_pages_current
> __alloc_pages_nodemask
> get_page_from_freelist
> (try_this_zone:) rmqueue
> rmqueue_pcplist
> __rmqueue_pcplist
> rmqueue_bulk
> <-- spin_lock(&zone->lock) - BUG
>
> Fix this by using {get,put}_cpu_light() in ipcomp_decompress().
>
> Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx>

Reviewed-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>

Thanks!
-- Daniel