Re: [PATCH v3 2/2] rcu: Dump vmalloc memory info safely

From: Vlastimil Babka
Date: Thu Sep 07 2023 - 11:40:58 EST


On 9/6/23 21:18, Lorenzo Stoakes wrote:
> On Tue, 5 Sept 2023 at 12:48, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>
>> On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
>> > On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
>> > > From: Zqiang <qiang.zhang1211@xxxxxxxxx>
>> > >
>> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
>> > > memory info, if the objects is not allocated from the slab allocator,
>> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
>> > > need to be held, since the call_rcu() can be invoked in interrupt context,
>> > > therefore, there is a possibility of spinlock deadlock scenarios.
>> > >
>> > > And in Preempt-RT kernel, the rcutorture test also trigger the following
>> > > lockdep warning:
>> > >
>> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
>> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
>> > > preempt_count: 1, expected: 0
>> > > RCU nest depth: 1, expected: 1
>> > > 3 locks held by swapper/0/1:
>> > > #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
>> > > #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
>> > > #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
>> > > irq event stamp: 565512
>> > > hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
>> > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
>> > > softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
>> > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
>> > > Preemption disabled at:
>> > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
>> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15
>> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
>> > > Call Trace:
>> > > <TASK>
>> > > dump_stack_lvl+0x68/0xb0
>> > > dump_stack+0x14/0x20
>> > > __might_resched+0x1aa/0x280
>> > > ? __pfx_rcu_torture_err_cb+0x10/0x10
>> > > rt_spin_lock+0x53/0x130
>> > > ? find_vmap_area+0x1f/0x70
>> > > find_vmap_area+0x1f/0x70
>> > > vmalloc_dump_obj+0x20/0x60
>> > > mem_dump_obj+0x22/0x90
>> > > __call_rcu_common+0x5bf/0x940
>> > > ? debug_smp_processor_id+0x1b/0x30
>> > > call_rcu_hurry+0x14/0x20
>> > > rcu_torture_init+0x1f82/0x2370
>> > > ? __pfx_rcu_torture_leak_cb+0x10/0x10
>> > > ? __pfx_rcu_torture_leak_cb+0x10/0x10
>> > > ? __pfx_rcu_torture_init+0x10/0x10
>> > > do_one_initcall+0x6c/0x300
>> > > ? debug_smp_processor_id+0x1b/0x30
>> > > kernel_init_freeable+0x2b9/0x540
>> > > ? __pfx_kernel_init+0x10/0x10
>> > > kernel_init+0x1f/0x150
>> > > ret_from_fork+0x40/0x50
>> > > ? __pfx_kernel_init+0x10/0x10
>> > > ret_from_fork_asm+0x1b/0x30
>> > > </TASK>
>> > >
>> > > The previous patch fixes this by using the deadlock-safe best-effort
>> > > version of find_vm_area. However, in case of failure print the fact that
>> > > the pointer was a vmalloc pointer so that we print at least something.
>> > >
>> > > Reported-by: Zhen Lei <thunder.leizhen@xxxxxxxxxxxxxxx>
>> > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
>> > > Cc: rcu@xxxxxxxxxxxxxxx
>> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
>> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
>> > > Cc: stable@xxxxxxxxxxxxxxx
>> > > Signed-off-by: Zqiang <qiang.zhang1211@xxxxxxxxx>
>> > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
>> > > ---
>> > > mm/util.c | 4 +++-
>> > > 1 file changed, 3 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/mm/util.c b/mm/util.c
>> > > index dd12b9531ac4..406634f26918 100644
>> > > --- a/mm/util.c
>> > > +++ b/mm/util.c
>> > > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
>> > > if (vmalloc_dump_obj(object))
>> > > return;
>> > >
>> > > - if (virt_addr_valid(object))
>> > > + if (is_vmalloc_addr(object))
>> > > + type = "vmalloc memory";
>> > > + else if (virt_addr_valid(object))
>> > > type = "non-slab/vmalloc memory";
>> >
>> > I think you should update this to say non-slab/non-vmalloc memory (as much
>> > as that description sucks!) as this phrasing in the past meant to say
>> > 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
>> > clear.
>>
>> True, though the issue you mentioned it is in existing code, a respin of this
>> patch could update it to say non-vmalloc. Good point, thanks for reviewing!
>
> No it's not, you're changing the meaning, because you changed the code
> that determines the output...

I think it has always meant (but clearly it's not unambiguously worded) "not
slab && not vmalloc", that is before and after this patch. Only in case
patch 1 is applied and patch 2 not, can the output be wrong in that a
vmalloc pointer will (in case of trylock fail) be classified as "not slab &&
not vmalloc", but seems fine to me after patch 2.

I guess if we wanted, we could also rewrite it to be more like the kmem
check in the beginning of mem_dump_obj(), so there would be:

if (is_vmalloc_addr(...)) {
vmalloc_dump_obj(...);
return;
}

where vmalloc_dump_obj() itself would print at least "vmalloc memory" with
no further details in case of trylock failure.

that assumes is_vmalloc_addr() is guaranteed to be true for all addresses
that __find_vmap_area resolves, otherwise it could miss something compared
to current code. Is it guaranteed?

> This has been merged now despite my outstanding comments (!) so I
> guess I'll have to send a follow up patch to address this.
>
>>
>> - Joel
>>
>
>
>
> --
> Lorenzo Stoakes
> https://ljs.io