cat /proc/sys/kernel/tainted => page allocation failure: order:6

From: Rantala, Tommi T. (Nokia - FI/Espoo)
Date: Mon May 10 2021 - 08:50:09 EST


Hello,

Why is order 6 allocation required for "cat /proc/sys/kernel/tainted" ...?
I'm seeing (occasional) failures in one VM (with 65d uptime):

[5674989.634561] cat: page allocation failure: order:6, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=user.slice,mems_allowed=0
[5674989.645432] CPU: 0 PID: 2717524 Comm: cat Not tainted 5.10.19-200.fc33.x86_64 #1
[5674989.648623] Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
[5674989.651622] Call Trace:
[5674989.653068] dump_stack+0x6b/0x83
[5674989.654744] warn_alloc.cold+0x75/0xd9
[5674989.656545] ? _cond_resched+0x16/0x40
[5674989.658365] ? __alloc_pages_direct_compact+0x144/0x150
[5674989.660669] __alloc_pages_slowpath.constprop.0+0xc91/0xcd0
[5674989.663129] __alloc_pages_nodemask+0x30a/0x340
[5674989.665189] kmalloc_order+0x28/0x80
[5674989.666959] kmalloc_order_trace+0x19/0x80
[5674989.668893] proc_sys_call_handler+0xab/0x230
[5674989.670898] ? handle_mm_fault+0x1036/0x1970
[5674989.672881] new_sync_read+0x105/0x180
[5674989.674659] vfs_read+0x14b/0x1a0
[5674989.676284] ksys_read+0x4f/0xc0
[5674989.677911] do_syscall_64+0x33/0x40
[5674989.679625] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[5674989.681799] RIP: 0033:0x7fae58271442
[5674989.683473] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 52 39 0a 00 e8 75 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[5674989.690565] RSP: 002b:00007ffd8f99e018 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[5674989.693546] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fae58271442
[5674989.696435] RDX: 0000000000020000 RSI: 00007fae5815c000 RDI: 0000000000000003
[5674989.699393] RBP: 00007fae5815c000 R08: 00007fae5815b010 R09: 0000000000000000
[5674989.702220] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
[5674989.705079] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[5674989.708146] Mem-Info:
[5674989.709501] active_anon:35569 inactive_anon:77146 isolated_anon:0
active_file:2408438 inactive_file:1402557 isolated_file:0
unevictable:0 dirty:406 writeback:0
slab_reclaimable:71277 slab_unreclaimable:23932
mapped:62922 shmem:36545 pagetables:3505 bounce:0
free:60719 free_pcp:0 free_cma:0
[5674989.722922] Node 0 active_anon:142276kB inactive_anon:308584kB active_file:9633752kB inactive_file:5610228kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:251688kB dirty:1820kB writeback:0kB shmem:146180kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4608kB all_unreclaimable? no
[5674989.734181] Node 0 DMA free:13860kB min:64kB low:80kB high:96kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[5674989.748718] lowmem_reserve[]: 0 2966 15955 15955 15955
[5674989.750956] Node 0 DMA32 free:155888kB min:12552kB low:15688kB high:18824kB reserved_highatomic:0KB active_anon:336kB inactive_anon:98108kB active_file:1759028kB inactive_file:969280kB unevictable:0kB writepending:580kB present:3129192kB managed:3063656kB mlocked:0kB pagetables:224kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[5674989.762669] lowmem_reserve[]: 0 0 12988 12988 12988
[5674989.764847] Node 0 Normal free:73128kB min:54960kB low:68700kB high:82440kB reserved_highatomic:2048KB active_anon:141940kB inactive_anon:210380kB active_file:7874940kB inactive_file:4639712kB unevictable:0kB writepending:1044kB present:13631488kB managed:13307412kB mlocked:0kB pagetables:13796kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[5674989.776607] lowmem_reserve[]: 0 0 0 0 0
[5674989.778586] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 13860kB
[5674989.784184] Node 0 DMA32: 3012*4kB (UME) 2410*8kB (UME) 6740*16kB (UME) 152*32kB (UME) 202*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 156960kB
[5674989.790071] Node 0 Normal: 3437*4kB (UMEH) 2971*8kB (UMEH) 1108*16kB (UMEH) 321*32kB (UEH) 92*64kB (UEH) 13*128kB (UEH) 0*256kB 0*512kB 1*1024kB (H) 0*2048kB 0*4096kB = 74092kB
[5674989.796591] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[5674989.800230] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[5674989.803812] 3847327 total pagecache pages
[5674989.805797] 0 pages in swap cache
[5674989.807596] Swap cache stats: add 0, delete 0, find 0/0
[5674989.810045] Free swap = 0kB
[5674989.811606] Total swap = 0kB
[5674989.813254] 4194168 pages RAM
[5674989.814930] 0 pages HighMem/MovableOnly
[5674989.816823] 97424 pages reserved
[5674989.818505] 0 pages cma reserved
[5674989.820185] 0 pages hwpoisoned


Memory is available:

$ cat /proc/meminfo
MemTotal: 16386976 kB
MemFree: 2917472 kB
MemAvailable: 15395272 kB
Buffers: 590128 kB
Cached: 12106492 kB
SwapCached: 0 kB
...

cat uses 128kB buffer:

$ strace -y -e read cat /proc/sys/kernel/tainted
read(3</proc/sys/kernel/tainted>, "0\n", 131072) = 2
0

Seems to be introduced in v5.8-rc1:

commit 32927393dc1ccd60fb2bdc05b9e8e88753761469
Author: Christoph Hellwig <hch@xxxxxx>
Date: Fri Apr 24 08:43:38 2020 +0200

sysctl: pass kernel pointers to ->proc_handler

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

Could this be improved to allow cat'ting of files in /proc/sys/ without the
big kernel memory allocation?

-Tommi