Re: devm_memremap_pages() triggers a kasan_add_zero_shadow() warning

From: Qian Cai
Date: Wed Aug 21 2019 - 17:12:13 EST


On Sat, 2019-08-17 at 23:25 -0400, Qian Cai wrote:
> > On Aug 17, 2019, at 12:59 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> >
> > On Sat, Aug 17, 2019 at 4:13 AM Qian Cai <cai@xxxxxx> wrote:
> > >
> > >
> > >
> > > > On Aug 16, 2019, at 11:57 PM, Dan Williams <dan.j.williams@xxxxxxxxx>
> > > > wrote:
> > > >
> > > > On Fri, Aug 16, 2019 at 8:34 PM Qian Cai <cai@xxxxxx> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > On Aug 16, 2019, at 5:48 PM, Dan Williams <dan.j.williams@xxxxxxxxx>
> > > > > > wrote:
> > > > > >
> > > > > > On Fri, Aug 16, 2019 at 2:36 PM Qian Cai <cai@xxxxxx> wrote:
> > > > > > >
> > > > > > > Every so often recently, booting Intel CPU server on linux-next
> > > > > > > triggers this
> > > > > > > warning. Trying to figure out ifÂÂthe commit 7cc7867fb061
> > > > > > > ("mm/devm_memremap_pages: enable sub-section remap") is the
> > > > > > > culprit here.
> > > > > > >
> > > > > > > # ./scripts/faddr2line vmlinux devm_memremap_pages+0x894/0xc70
> > > > > > > devm_memremap_pages+0x894/0xc70:
> > > > > > > devm_memremap_pages at mm/memremap.c:307
> > > > > >
> > > > > > Previously the forced section alignment in devm_memremap_pages()
> > > > > > would
> > > > > > cause the implementation to never violate the
> > > > > > KASAN_SHADOW_SCALE_SIZE
> > > > > > (12K on x86) constraint.
> > > > > >
> > > > > > Can you provide a dump of /proc/iomem? I'm curious what resource is
> > > > > > triggering such a small alignment granularity.
> > > > >
> > > > > This is with memmap=4G!4G ,
> > > > >
> > > > > # cat /proc/iomem
> > > >
> > > > [..]
> > > > > 100000000-155dfffff : Persistent Memory (legacy)
> > > > > 100000000-155dfffff : namespace0.0
> > > > > 155e00000-15982bfff : System RAM
> > > > > 155e00000-156a00fa0 : Kernel code
> > > > > 156a00fa1-15765d67f : Kernel data
> > > > > 157837000-1597fffff : Kernel bss
> > > > > 15982c000-1ffffffff : Persistent Memory (legacy)
> > > > > 200000000-87fffffff : System RAM
> > > >
> > > > Ok, looks like 4G is bad choice to land the pmem emulation on this
> > > > system because it collides with where the kernel is deployed and gets
> > > > broken into tiny pieces that violate kasan's. This is a known problem
> > > > with memmap=. You need to pick an memory range that does not collide
> > > > with anything else. See:
> > > >
> > > > Â https://nvdimm.wiki.kernel.org/how_to_choose_the_correct_memmap_kernel
> > > > _parameter_for_pmem_on_your_system
> > > >
> > > > ...for more info.
> > >
> > > Well, it seems I did exactly follow the information in that link,
> > >
> > > [ÂÂÂÂ0.000000] BIOS-provided physical RAM map:
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000093fff]
> > > usable
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x0000000000094000-0x000000000009ffff]
> > > reserved
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff]
> > > reserved
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000005a7a0fff]
> > > usable
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x000000005a7a1000-0x000000005b5e0fff]
> > > reserved
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x000000005b5e1000-0x00000000790fefff]
> > > usable
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x00000000790ff000-0x00000000791fefff]
> > > reserved
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x00000000791ff000-0x000000007b5fefff] ACPI
> > > NVS
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x000000007b5ff000-0x000000007b7fefff] ACPI
> > > data
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x000000007b7ff000-0x000000007b7fffff]
> > > usable
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x000000007b800000-0x000000008fffffff]
> > > reserved
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff]
> > > reserved
> > > [ÂÂÂÂ0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000087fffffff]
> > > usable
> > >
> > > Where 4G is good. Then,
> > >
> > > [ÂÂÂÂ0.000000] user-defined physical RAM map:
> > > [ÂÂÂÂ0.000000] user: [mem 0x0000000000000000-0x0000000000093fff] usable
> > > [ÂÂÂÂ0.000000] user: [mem 0x0000000000094000-0x000000000009ffff] reserved
> > > [ÂÂÂÂ0.000000] user: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> > > [ÂÂÂÂ0.000000] user: [mem 0x0000000000100000-0x000000005a7a0fff] usable
> > > [ÂÂÂÂ0.000000] user: [mem 0x000000005a7a1000-0x000000005b5e0fff] reserved
> > > [ÂÂÂÂ0.000000] user: [mem 0x000000005b5e1000-0x00000000790fefff] usable
> > > [ÂÂÂÂ0.000000] user: [mem 0x00000000790ff000-0x00000000791fefff] reserved
> > > [ÂÂÂÂ0.000000] user: [mem 0x00000000791ff000-0x000000007b5fefff] ACPI NVS
> > > [ÂÂÂÂ0.000000] user: [mem 0x000000007b5ff000-0x000000007b7fefff] ACPI data
> > > [ÂÂÂÂ0.000000] user: [mem 0x000000007b7ff000-0x000000007b7fffff] usable
> > > [ÂÂÂÂ0.000000] user: [mem 0x000000007b800000-0x000000008fffffff] reserved
> > > [ÂÂÂÂ0.000000] user: [mem 0x00000000ff800000-0x00000000ffffffff] reserved
> > > [ÂÂÂÂ0.000000] user: [mem 0x0000000100000000-0x00000001ffffffff]
> > > persistent (type 12)
> > > [ÂÂÂÂ0.000000] user: [mem 0x0000000200000000-0x000000087fffffff] usable
> > >
> > > The doc did mention that âThere seems to be an issue with CONFIG_KSAN at
> > > the moment however.â
> > > without more detail though.
> >
> > Does disabling CONFIG_RANDOMIZE_BASE help? Maybe that workaround has
> > regressed. Effectively we need to find what is causing the kernel to
> > sometimes be placed in the middle of a custom reserved memmap= range.
>
> Yes, disabling KASLR works good so far. Assuming the workaround, i.e.,
> f28442497b5c
> (âx86/boot: Fix KASLR and memmap= collisionâ) is correct.
>
> The only other commit that might regress it from my research so far is,
>
> d52e7d5a952c ("x86/KASLR: Parse all 'memmap=' boot option entriesâ)
>

It turns out that the origin commit f28442497b5c (âx86/boot: Fix KASLR and
memmap= collisionâ) has a bug that is unable to handle "memmap=" in
CONFIG_CMDLINE instead of a parameter in bootloader because when it (as well as
the commit d52e7d5a952c) calls get_cmd_line_ptr() in order to run
mem_avoid_memmap(), "boot_params" has no knowledge of CONFIG_CMDLINE. Only later
in setup_arch(), the kernel will deal with parameters over there.