Re: Issue With Kernel Changes To Core Dump Collection (Kernel Bug...?)

From: Randy Dunlap
Date: Fri Jan 21 2022 - 13:18:04 EST


[add the patch author, Jann]


On 1/20/22 17:31, Bill Messmer wrote:
> Hello,
>
> It has been my understanding for some time that the kernel config option CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS (and the corresponding bit 4 of the coredump filter) was, at one point, added for the purpose of ensuring that the GNU build-id of ELF objects was included in core dumps.  The config description in Kconfig.binfmt even alludes to this in its description.
>
> I am trying to understand why in the 5.10+ kernels, there was a change in the kernel that, instead of checking whether a given memory mapping had an ELF header in order to determine whether to include the page to checking whether the inode is executable.  The change in question:
>
> github.com/torvalds/linux/commit/429a22e776a2b9f85a2b9c53d8e647598b553dd1
>

Bill,
You should send email(s) to the relevant people if you can identify them.
LKML is a huge pipe (hose) and people don't normally browse it. :)


> In many distributions (e.g.: Ubuntu), the shared objects in /usr/lib and elsewhere are not marked as executable.  One of the net effects here is that the first page of shared objects on these distributions are no longer captured in core dumps.
>
> A core dump taken on Ubuntu 21.10 (with the 5.13 kernel) will, by default, not include these pages:
>
>   LOAD           0x0000000000007000 0x00007f375855f000 0x0000000000000000
>                  0x0000000000000000 0x000000000002c000  R      0x1000
>
>    0x00007f375855f000  0x00007f375858b000  0x0000000000000000
>         /usr/lib/x86_64-linux-gnu/libc.so.6
>
> Doing a quick "sudo chmod +x /usr/lib/x86_64-linux-gnu/libc.so.6" and repeating shows that it is:
>
>   LOAD           0x0000000000007000 0x00007fefd5282000 0x0000000000000000
>                  0x0000000000001000 0x000000000002c000  R      0x1000
>
>     0x00007fefd5282000  0x00007fefd52ae000  0x0000000000000000
>         /usr/lib/x86_64-linux-gnu/libc.so.6
>
> Prior to running with 5.10+ kernels, I was always seeing the first page of shared objects (and the contained build-id) within core dumps (assuming the proper kernel config and core dump filter bits).  Not any longer.
>
> The reason I ask this is that, as more teams here at Microsoft have products running on Linux (or in Linux containers), we have been pushing the crash reports for those up through the same post-mortem crash analysis infrastructure that we do for Windows.  That means that what has traditionally been the Windows debugger (e.g.: WinDbg) has, for some time, been able to open, debug, and analyze various Linux post-mortem crash formats.  Part of doing this on a post-mortem basis requires finding the original images and debug information for the executables and shared objects referenced in those core dumps.  Whether we do that via our own symbol servers or via a debuginfod service -- the post-mortem debugger needs access to the build-ids of those objects.
>
> Until recently, finding these from a core dump has been stable and working quite well.  Of late, however, we have been seeing a number of crash reports (e.g.: from Debian or Ubuntu containers) where we can no longer find images & symbols based on the core dumps because this kernel change has caused the first page of shared object files to not be captured in core dumps.  I don't know how many post-mortem Linux crash analysis solutions this is affecting... 
>
> Was the change here really the intent...?  or is this a kernel bug?
>
> Sincerely,
>
> Bill Messmer
> wmessmer@xxxxxxxxxxxxx

--
~Randy