Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot

From: Vlastimil Babka
Date: Tue Mar 01 2022 - 04:41:15 EST


On 3/1/22 10:21, Mike Rapoport wrote:
> On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
>> On 2/28/22 21:01, Mike Rapoport wrote:
>> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> >> On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> >> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> >> >> Hi,
>> >> >>
>> >> >> this series combines and revives patches from Oliver's last year
>> >> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> >> >> files alloc_traces and free_traces more useful.
>> >> >> The resubmission was blocked on stackdepot changes that are now merged,
>> >> >> as explained in patch 2.
>> >> >>
>> >> >
>> >> > Hello. I just started review/testing this series.
>> >> >
>> >> > it crashed on my system (arm64)
>> >>
>> >> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> >> from memblock. arm64 must have memblock freeing happen earlier or something.
>> >> (CCing memblock experts)
>> >>
>> >> > I ran with boot parameter slub_debug=U, and without KASAN.
>> >> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> >> >
>> >> > void * __init memblock_alloc_try_nid(
>> >> > phys_addr_t size, phys_addr_t align,
>> >> > phys_addr_t min_addr, phys_addr_t max_addr,
>> >> > int nid)
>> >> > {
>> >> > void *ptr;
>> >> >
>> >> > memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> >> > __func__, (u64)size, (u64)align, nid, &min_addr,
>> >> > &max_addr, (void *)_RET_IP_);
>> >> > ptr = memblock_alloc_internal(size, align,
>> >> > min_addr, max_addr, nid, false);
>> >> > if (ptr)
>> >> > memset(ptr, 0, size); <--- Crash Here
>> >> >
>> >> > return ptr;
>> >> > }
>> >> >
>> >> > It crashed during create_boot_cache() -> stack_depot_init() ->
>> >> > memblock_alloc().
>> >> >
>> >> > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> >> > available. (AFAIU memblock is not available after mem_init() because of
>> >> > memblock_free_all(), right?)
>> >>
>> >> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> >> But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> >> returns NULL, we print ""Stack Depot hash table allocation failed,
>> >> disabling" and disable it. Instead it seems memblock_alloc() returns
>> >> something that's already potentially used by somebody else? Sounds like a bug?
>> >
>> > If stack_depot_init() is called from kmem_cache_init(), there will be a
>> > confusion what allocator should be used because we use slab_is_available()
>> > to stop using memblock and start using kmalloc() instead in both
>> > stack_depot_init() and in memblock.
>>
>> I did check that stack_depot_init() is called from kmem_cache_init()
>> *before* we make slab_is_available() true, hence assumed that memblock would
>> be still available at that point and expected no confusion. But seems if
>> memblock is already beyond memblock_free_all() then it being still available
>> is just an illusion?
>
> Yeah, it appears it is an illusion :)
>
> I think we have to deal with allocations that happen between
> memblock_free_all() and slab_is_available() at the memblock level and then
> figure out the where to put stack_depot_init() and how to allocate memory
> there.
>
> I believe something like this (untested) patch below addresses the first
> issue. As for stack_depot_init() I'm still trying to figure out the
> possible call paths, but it seems we can use stack_depot_early_init() for
> SLUB debugging case. I'll try to come up with something Really Soon (tm).

Yeah as you already noticed, we are pursuing an approach to decide on
calling stack_depot_early_init(), which should be a good way to solve this
given how special slab is in this case. For memblock I just wanted to point
out that it could be more robust, your patch below seems to be on the right
patch. Maybe it just doesn't have to fallback to buddy, which could be
considered a layering violation, but just return NULL that can be
immediately recognized as an error?

> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 50ad19662a32..4ea89d44d22a 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -90,6 +90,7 @@ struct memblock_type {
> */
> struct memblock {
> bool bottom_up; /* is bottom up direction? */
> + bool mem_freed;
> phys_addr_t current_limit;
> struct memblock_type memory;
> struct memblock_type reserved;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b12a364f2766..60196dc4980e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
> .reserved.name = "reserved",
>
> .bottom_up = false,
> + .mem_freed = false,
> .current_limit = MEMBLOCK_ALLOC_ANYWHERE,
> };
>
> @@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
> if (WARN_ON_ONCE(slab_is_available()))
> return kzalloc_node(size, GFP_NOWAIT, nid);
>
> + if (memblock.mem_freed) {
> + unsigned int order = get_order(size);
> +
> + pr_warn("memblock: allocating from buddy\n");
> + return __alloc_pages_node(nid, order, GFP_KERNEL);
> + }
> +
> if (max_addr > memblock.current_limit)
> max_addr = memblock.current_limit;
>
> @@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
>
> pages = free_low_memory_core_early();
> totalram_pages_add(pages);
> + memblock.mem_freed = true;
> }
>
> #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
>
>> > Hyeonggon, did you run your tests with panic on warn at any chance?
>> >
>> >> > Thanks!
>> >> >
>> >> > /*
>> >> > * Set up kernel memory allocators
>> >> > */
>> >> > static void __init mm_init(void)
>> >> > {
>> >> > /*
>> >> > * page_ext requires contiguous pages,
>> >> > * bigger than MAX_ORDER unless SPARSEMEM.
>> >> > */
>> >> > page_ext_init_flatmem();
>> >> > init_mem_debugging_and_hardening();
>> >> > kfence_alloc_pool();
>> >> > report_meminit();
>> >> > stack_depot_early_init();
>> >> > mem_init();
>> >> > mem_init_print_info();
>> >> > kmem_cache_init();
>> >> > /*
>> >> > * page_owner must be initialized after buddy is ready, and also after
>> >> > * slab is ready so that stack_depot_init() works properly
>> >> > */)
>> >> >
>> >> >> Patch 1 is a new preparatory cleanup.
>> >> >>
>> >> >> Patch 2 originally submitted here [1], was merged to mainline but
>> >> >> reverted for stackdepot related issues as explained in the patch.
>> >> >>
>> >> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> >> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> >> >> be considered too intrusive so I will postpone it for later. The docs
>> >> >> patch is adjusted accordingly.
>> >> >>
>> >> >> Also available in git, based on v5.17-rc1:
>> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> >> >>
>> >> >> I'd like to ask for some review before I add this to the slab tree.
>> >> >>
>> >> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@xxxxxxxxx/
>> >> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@xxxxxxxxx/
>> >> >>
>> >> >> Oliver Glitta (4):
>> >> >> mm/slub: use stackdepot to save stack trace in objects
>> >> >> mm/slub: aggregate and print stack traces in debugfs files
>> >> >> mm/slub: sort debugfs output by frequency of stack traces
>> >> >> slab, documentation: add description of debugfs files for SLUB caches
>> >> >>
>> >> >> Vlastimil Babka (1):
>> >> >> mm/slub: move struct track init out of set_track()
>> >> >>
>> >> >> Documentation/vm/slub.rst | 61 +++++++++++++++
>> >> >> init/Kconfig | 1 +
>> >> >> mm/slub.c | 152 +++++++++++++++++++++++++-------------
>> >> >> 3 files changed, 162 insertions(+), 52 deletions(-)
>> >> >>
>> >> >> --
>> >> >> 2.35.1
>> >> >>
>> >> >>
>> >> >
>> >>
>> >
>>
>