[BUG REPORT] Coredump allocates and maps, unallocated sectors of shared memory regions, in a program, during program termination

From: Abhi Das (abhida)
Date: Sat Feb 11 2023 - 00:11:30 EST


Hello,

I have noticed a flaw in the linux kernel's coredump system. I have observed that unallocated sectors of shared memory regions in any program's virtual address space, get forcefully allocated and mapped when said programs terminate and produce coredumps.

I created a simple 1G shared memory mapping, and wrote only 4kb of data to the region. I expected only one, 4k sized page, to be allocated and mapped. On program termination with coredumps enabled, I expected the program's binary core file, produced by coredump, to just contain 4kb of data for the shared memory region in question.
I also expected the corresponding file for the shared memory region, in /dev/shm, to be backed up by a single 4k sized block on the file system.

However, I have observed that the terminated process' binary core file contained 1Gb of data for the shared memory region. While the file for the shared memory region, within /dev/shm, was now backed up by several blocks on the filesystem adding up to 1GB of data. This indicates that coredump forcefully allocated and mapped the remainder of the unused pages in the 1GB shared memory region.

I have replicated this issue on a ubuntu vm, with 8G of ram and vanilla linux kernels 6.1, 6.0, 5.18, 5.15.64, 5.4.212. All vanilla kernels were compiled and installed manually. The system was also never tainted. All thought other kernel versions were not tested, I am quite confident that the issue exists in other kernel versions.

I have also filed the details of this issue in bugzilla, and I have detailed the steps that can be taken to reproduce the issue, here you can find the code I used to reproduce the bug: https://bugzilla.kernel.org/show_bug.cgi?id=217010
I have also attached a screenshot of shell commands from a system with kernel 6.1, that clearly reproduce the issue: https://bugzilla.kernel.org/attachment.cgi?id=303704&action=edit.

To reproduce the issue, set core_pattern to "core" by running 'echo core > /proc/sys/kernel/core_pattern' and set the coredump filter mask to 0x0000007B, with 'echo 0x0000007B > /proc/self/coredump_filter'.
This will ensure a programs core file will be kept in the same directory as the program, for simplicity. While the filter mask will ensure that coredump collects all types of memory mappings from the virtual address space. This information is referenced from https://man7.org/linux/man-pages/man5/core.5.html

The commands in the screenshot are explained as follow:
- a system contains 6.9G of available memory.
- begin execution of program called fractional_memeater.bin, with following input parameters
- size of shared memory sector in bytes
- fraction of shared memory sector to be written to, i.e 1/10th in this example.
- name of shared memory sector
- program is stalled after it successfully creates the memory mapping and writes to a fraction of the total sector (it is now waiting to be terminated).
- the system now has 6.8G available memory, while displaying 100MB of shared memory is being used.
- the df utility is used to show that 100Mb, is the approximate amount of used memory in the /dev/shm directory.
- 'stat /dev/shm/shm_1G_100Mb' is used to show that the shared memory file is currently being backed up by 100MB of data on the filesystem.
- program is terminated with kill -6
- free utility displays that the amount of available memory on the device is now 5.9G, it has dropped by 1Gb. While the amount of shared memory is now 1G.
- df shows that /dev/shm is now using 1G on the file system.
- stat utility on shared object file, shows that the file is being backed by 1G on the filesystem.

Note: This system is a vm of ubuntu running a vanilla 6.1 kernel, with no other shared memory objects on the system, expect the one discussed. The system has also not been tainted.

Could this be expected behavior for fs/coredump.c? If not, how can we work towards a potential upstream fix?

Thanks.