Re: [PATCH 1/1] mm/page_alloc: Leave IRQs enabled for per-cpu page allocations

From: Yu Zhao
Date: Thu Aug 25 2022 - 00:59:09 EST


On Wed, Aug 24, 2022 at 8:18 AM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>
> The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task
> allocating from the PCP must not re-enter the allocator from IRQ context.
> In each instance where IRQ-reentrancy is possible, the lock is acquired using
> pcp_spin_trylock_irqsave() even though IRQs are disabled and re-entrancy
> is impossible.
>
> Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the common
> case at the cost of some IRQ allocations taking a slower path. If the PCP
> lists need to be refilled, the zone lock still needs to disable IRQs but
> that will only happen on PCP refill and drain. If an IRQ is raised when
> a PCP allocation is in progress, the trylock will fail and fallback to
> using the buddy lists directly. Note that this may not be a universal win
> if an interrupt-intensive workload also allocates heavily from interrupt
> context and contends heavily on the zone->lock as a result.

Hi,

This patch caused the following warning. Please take a look.

Thanks.

WARNING: inconsistent lock state
6.0.0-dbg-DEV #1 Tainted: G S W O
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/27 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff9ce5002b8c58 (&pcp->lock){+.?.}-{2:2}, at:
free_unref_page_list+0x1ac/0x260
{SOFTIRQ-ON-W} state was registered at:
lock_acquire+0xb3/0x190
_raw_spin_trylock+0x46/0x60
rmqueue_pcplist+0x42/0x1d0
rmqueue+0x58/0x590
get_page_from_freelist+0x2c3/0x510
__alloc_pages+0x126/0x210
alloc_page_interleave+0x13/0x90
alloc_pages+0xfb/0x250
__get_free_pages+0x11/0x30
__pte_alloc_kernel+0x1c/0xc0
vmap_p4d_range+0x448/0x690
ioremap_page_range+0xdc/0x130
__ioremap_caller+0x258/0x320
ioremap_cache+0x17/0x20
acpi_os_map_iomem+0x12f/0x1d0
acpi_os_map_memory+0xe/0x10
acpi_tb_acquire_table+0x42/0x6e
acpi_tb_validate_temp_table+0x43/0x55
acpi_tb_verify_temp_table+0x31/0x238
acpi_reallocate_root_table+0xe6/0x158
acpi_early_init+0x4f/0xd1
start_kernel+0x32a/0x44f
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x124/0x12b
secondary_startup_64_no_verify+0xe6/0xeb
irq event stamp: 961581
hardirqs last enabled at (961580): [<ffffffff95b2cde5>]
_raw_spin_unlock_irqrestore+0x35/0x50
hardirqs last disabled at (961581): [<ffffffff951c1998>]
folio_rotate_reclaimable+0xf8/0x310
softirqs last enabled at (961490): [<ffffffff94fa40d8>]
run_ksoftirqd+0x48/0x90
softirqs last disabled at (961495): [<ffffffff94fa40d8>]
run_ksoftirqd+0x48/0x90

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&pcp->lock);
<Interrupt>
lock(&pcp->lock);

*** DEADLOCK ***

1 lock held by ksoftirqd/2/27:
#0: ffff9ce5002adab8 (lock#7){..-.}-{2:2}, at: local_lock_acquire+0x0/0x70

stack backtrace:
CPU: 2 PID: 27 Comm: ksoftirqd/2 Tainted: G S W O 6.0.0-dbg-DEV #1
Call Trace:
<TASK>
dump_stack_lvl+0x6c/0x9a
dump_stack+0x10/0x12
print_usage_bug+0x374/0x380
mark_lock_irq+0x4a8/0x4c0
? save_trace+0x40/0x2c0
mark_lock+0x137/0x1b0
__lock_acquire+0x5bf/0x3540
? __SCT__tp_func_virtio_transport_recv_pkt+0x7/0x8
? lock_is_held_type+0x96/0x130
? rcu_read_lock_sched_held+0x49/0xa0
lock_acquire+0xb3/0x190
? free_unref_page_list+0x1ac/0x260
_raw_spin_lock+0x2f/0x40
? free_unref_page_list+0x1ac/0x260
free_unref_page_list+0x1ac/0x260
release_pages+0x90a/0xa70
? folio_batch_move_lru+0x138/0x190
? local_lock_acquire+0x70/0x70
folio_batch_move_lru+0x147/0x190
folio_rotate_reclaimable+0x168/0x310
folio_end_writeback+0x5d/0x200
end_page_writeback+0x18/0x40
end_swap_bio_write+0x100/0x2b0
? bio_chain+0x30/0x30
bio_endio+0xd8/0xf0
blk_update_request+0x173/0x340
scsi_end_request+0x2a/0x300
scsi_io_completion+0x66/0x140
scsi_finish_command+0xc0/0xf0
scsi_complete+0xec/0x110
blk_done_softirq+0x53/0x70
__do_softirq+0x1e2/0x357
? run_ksoftirqd+0x48/0x90
run_ksoftirqd+0x48/0x90
smpboot_thread_fn+0x14b/0x1c0
kthread+0xe6/0x100
? cpu_report_death+0x50/0x50
? kthread_blkcg+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>