Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

From: Tetsuo Handa
Date: Wed Sep 05 2018 - 21:00:08 EST


Michal Hocko wrote:
> On Wed 05-09-18 22:53:33, Tetsuo Handa wrote:
> > On 2018/09/05 22:40, Michal Hocko wrote:
> > > Changelog said
> > >
> > > "Although this is possible in principle let's wait for it to actually
> > > happen in real life before we make the locking more complex again."
> > >
> > > So what is the real life workload that hits it? The log you have pasted
> > > below doesn't tell much.
> >
> > Nothing special. I just ran a multi-threaded memory eater on a CONFIG_PREEMPT=y kernel.
>
> I strongly suspec that your test doesn't really represent or simulate
> any real and useful workload. Sure it triggers a rare race and we kill
> another oom victim. Does this warrant to make the code more complex?
> Well, I am not convinced, as I've said countless times.

Yes. Below is an example from a machine running Apache Web server/Tomcat AP server/PostgreSQL DB server.
An memory eater needlessly killed Tomcat due to this race. I assert that we should fix af5679fbc669f31f.



Before:

systemd(1)-+-NetworkManager(693)-+-dhclient(791)
| |-{NetworkManager}(698)
| `-{NetworkManager}(702)
|-abrtd(630)
|-agetty(1007)
|-atd(653)
|-auditd(600)---{auditd}(601)
|-avahi-daemon(625)---avahi-daemon(631)
|-crond(657)
|-dbus-daemon(638)
|-firewalld(661)---{firewalld}(788)
|-httpd(1169)-+-httpd(1170)
| |-httpd(1171)
| |-httpd(1172)
| |-httpd(1173)
| `-httpd(1174)
|-irqbalance(628)
|-java(1074)-+-{java}(1092)
| |-{java}(1093)
| |-{java}(1094)
| |-{java}(1095)
| |-{java}(1096)
| |-{java}(1097)
| |-{java}(1098)
| |-{java}(1099)
| |-{java}(1100)
| |-{java}(1101)
| |-{java}(1102)
| |-{java}(1103)
| |-{java}(1104)
| |-{java}(1105)
| |-{java}(1106)
| |-{java}(1107)
| |-{java}(1108)
| |-{java}(1109)
| |-{java}(1110)
| |-{java}(1111)
| |-{java}(1114)
| |-{java}(1115)
| |-{java}(1116)
| |-{java}(1117)
| |-{java}(1118)
| |-{java}(1119)
| |-{java}(1120)
| |-{java}(1121)
| |-{java}(1122)
| |-{java}(1123)
| |-{java}(1124)
| |-{java}(1125)
| |-{java}(1126)
| |-{java}(1127)
| |-{java}(1128)
| |-{java}(1129)
| |-{java}(1130)
| |-{java}(1131)
| |-{java}(1132)
| |-{java}(1133)
| |-{java}(1134)
| |-{java}(1135)
| |-{java}(1136)
| |-{java}(1137)
| `-{java}(1138)
|-ksmtuned(659)---sleep(1727)
|-login(1006)---bash(1052)---pstree(1728)
|-polkitd(624)-+-{polkitd}(633)
| |-{polkitd}(642)
| |-{polkitd}(643)
| |-{polkitd}(645)
| `-{polkitd}(650)
|-postgres(1154)-+-postgres(1155)
| |-postgres(1157)
| |-postgres(1158)
| |-postgres(1159)
| |-postgres(1160)
| `-postgres(1161)
|-rsyslogd(986)-+-{rsyslogd}(997)
| `-{rsyslogd}(999)
|-sendmail(1008)
|-sendmail(1023)
|-smbd(983)-+-cleanupd(1027)
| |-lpqd(1032)
| `-smbd-notifyd(1026)
|-sshd(981)
|-systemd-journal(529)
|-systemd-logind(627)
|-systemd-udevd(560)
`-tuned(980)-+-{tuned}(1030)
|-{tuned}(1031)
|-{tuned}(1033)
`-{tuned}(1047)



After:

systemd(1)-+-NetworkManager(693)-+-dhclient(791)
| |-{NetworkManager}(698)
| `-{NetworkManager}(702)
|-abrtd(630)
|-agetty(1007)
|-atd(653)
|-auditd(600)---{auditd}(601)
|-avahi-daemon(625)---avahi-daemon(631)
|-crond(657)
|-dbus-daemon(638)
|-firewalld(661)---{firewalld}(788)
|-httpd(1169)-+-httpd(1170)
| |-httpd(1171)
| |-httpd(1172)
| |-httpd(1173)
| `-httpd(1174)
|-irqbalance(628)
|-ksmtuned(659)---sleep(1758)
|-login(1006)---bash(1052)---pstree(1759)
|-polkitd(624)-+-{polkitd}(633)
| |-{polkitd}(642)
| |-{polkitd}(643)
| |-{polkitd}(645)
| `-{polkitd}(650)
|-postgres(1154)-+-postgres(1155)
| |-postgres(1157)
| |-postgres(1158)
| |-postgres(1159)
| |-postgres(1160)
| `-postgres(1161)
|-rsyslogd(986)-+-{rsyslogd}(997)
| `-{rsyslogd}(999)
|-sendmail(1008)
|-sendmail(1023)
|-smbd(983)-+-cleanupd(1027)
| |-lpqd(1032)
| `-smbd-notifyd(1026)
|-sshd(981)
|-systemd-journal(529)
|-systemd-logind(627)
|-systemd-udevd(560)
`-tuned(980)-+-{tuned}(1030)
|-{tuned}(1031)
|-{tuned}(1033)
`-{tuned}(1047)



[ 222.165946] a.out invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
[ 222.170631] a.out cpuset=/ mems_allowed=0
[ 222.172956] CPU: 4 PID: 1748 Comm: a.out Tainted: G T 4.19.0-rc2+ #690
[ 222.176517] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 222.180892] Call Trace:
[ 222.182947] dump_stack+0x85/0xcb
[ 222.185240] dump_header+0x69/0x2fe
[ 222.187579] ? _raw_spin_unlock_irqrestore+0x41/0x70
[ 222.190319] oom_kill_process+0x307/0x390
[ 222.192803] out_of_memory+0x2f3/0x5d0
[ 222.195190] __alloc_pages_slowpath+0xc01/0x1030
[ 222.197844] __alloc_pages_nodemask+0x333/0x390
[ 222.200452] alloc_pages_vma+0x77/0x1f0
[ 222.202869] __handle_mm_fault+0x81c/0xf40
[ 222.205334] handle_mm_fault+0x1b7/0x3c0
[ 222.207712] __do_page_fault+0x2a6/0x580
[ 222.210036] do_page_fault+0x32/0x270
[ 222.212266] ? page_fault+0x8/0x30
[ 222.214402] page_fault+0x1e/0x30
[ 222.216463] RIP: 0033:0x4008d8
[ 222.218429] Code: Bad RIP value.
[ 222.220388] RSP: 002b:00007fff34061350 EFLAGS: 00010206
[ 222.222931] RAX: 00007efea3c2e010 RBX: 0000000100000000 RCX: 0000000000000000
[ 222.225976] RDX: 00000000b190f000 RSI: 0000000000020000 RDI: 0000000200000050
[ 222.228891] RBP: 00007efea3c2e010 R08: 0000000200001000 R09: 0000000000021000
[ 222.231779] R10: 0000000000000022 R11: 0000000000001000 R12: 0000000000000006
[ 222.234626] R13: 00007fff34061440 R14: 0000000000000000 R15: 0000000000000000
[ 222.238482] Mem-Info:
[ 222.240511] active_anon:789816 inactive_anon:3457 isolated_anon:0
[ 222.240511] active_file:11 inactive_file:44 isolated_file:0
[ 222.240511] unevictable:0 dirty:6 writeback:0 unstable:0
[ 222.240511] slab_reclaimable:8052 slab_unreclaimable:24408
[ 222.240511] mapped:1898 shmem:3704 pagetables:4316 bounce:0
[ 222.240511] free:20841 free_pcp:0 free_cma:0
[ 222.254349] Node 0 active_anon:3159264kB inactive_anon:13828kB active_file:44kB inactive_file:176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:7592kB dirty:24kB writeback:0kB shmem:14816kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2793472kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 222.264038] Node 0 DMA free:13812kB min:308kB low:384kB high:460kB active_anon:1876kB inactive_anon:8kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15960kB managed:15876kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 222.274208] lowmem_reserve[]: 0 2674 3378 3378
[ 222.276831] Node 0 DMA32 free:56068kB min:53260kB low:66572kB high:79884kB active_anon:2673292kB inactive_anon:216kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129152kB managed:2738564kB mlocked:0kB kernel_stack:96kB pagetables:3024kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 222.287961] lowmem_reserve[]: 0 0 703 703
[ 222.291154] Node 0 Normal free:13672kB min:14012kB low:17512kB high:21012kB active_anon:483864kB inactive_anon:13604kB active_file:0kB inactive_file:4kB unevictable:0kB writepending:0kB present:1048576kB managed:720644kB mlocked:0kB kernel_stack:7520kB pagetables:14272kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 222.302407] lowmem_reserve[]: 0 0 0 0
[ 222.304596] Node 0 DMA: 1*4kB (M) 0*8kB 1*16kB (U) 7*32kB (UM) 4*64kB (U) 4*128kB (UM) 2*256kB (U) 0*512kB 0*1024kB 0*2048kB 3*4096kB (ME) = 13812kB
[ 222.311748] Node 0 DMA32: 37*4kB (U) 29*8kB (UM) 20*16kB (UM) 30*32kB (UME) 28*64kB (UME) 11*128kB (UME) 9*256kB (UME) 8*512kB (UM) 6*1024kB (UME) 1*2048kB (E) 9*4096kB (UM) = 56316kB
[ 222.318932] Node 0 Normal: 151*4kB (UM) 2*8kB (UM) 97*16kB (UM) 195*32kB (UME) 53*64kB (UME) 11*128kB (UME) 2*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13724kB
[ 222.325455] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 222.329059] 3707 total pagecache pages
[ 222.331315] 0 pages in swap cache
[ 222.333440] Swap cache stats: add 0, delete 0, find 0/0
[ 222.336097] Free swap = 0kB
[ 222.338091] Total swap = 0kB
[ 222.340084] 1048422 pages RAM
[ 222.342165] 0 pages HighMem/MovableOnly
[ 222.344460] 179651 pages reserved
[ 222.347423] 0 pages cma reserved
[ 222.349793] 0 pages hwpoisoned
[ 222.351784] Out of memory: Kill process 1748 (a.out) score 838 or sacrifice child
[ 222.355131] Killed process 1748 (a.out) total-vm:4267252kB, anon-rss:2909224kB, file-rss:0kB, shmem-rss:0kB
[ 222.359644] java invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[ 222.364180] java cpuset=/ mems_allowed=0
[ 222.366619] CPU: 0 PID: 1110 Comm: java Tainted: G T 4.19.0-rc2+ #690
[ 222.370088] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 222.377089] Call Trace:
[ 222.380503] dump_stack+0x85/0xcb
[ 222.380509] dump_header+0x69/0x2fe
[ 222.380515] ? _raw_spin_unlock_irqrestore+0x41/0x70
[ 222.380518] oom_kill_process+0x307/0x390
[ 222.380553] out_of_memory+0x2f3/0x5d0
[ 222.396414] __alloc_pages_slowpath+0xc01/0x1030
[ 222.396423] __alloc_pages_nodemask+0x333/0x390
[ 222.396431] filemap_fault+0x465/0x910
[ 222.404090] ? xfs_ilock+0xbf/0x2b0 [xfs]
[ 222.404118] ? __xfs_filemap_fault+0x7d/0x2c0 [xfs]
[ 222.404124] ? down_read_nested+0x66/0xa0
[ 222.404148] __xfs_filemap_fault+0x8e/0x2c0 [xfs]
[ 222.404156] __do_fault+0x11/0x133
[ 222.404158] __handle_mm_fault+0xa57/0xf40
[ 222.404165] handle_mm_fault+0x1b7/0x3c0
[ 222.404171] __do_page_fault+0x2a6/0x580
[ 222.404187] do_page_fault+0x32/0x270
[ 222.404194] ? page_fault+0x8/0x30
[ 222.404196] page_fault+0x1e/0x30
[ 222.404199] RIP: 0033:0x7fedb229ed42
[ 222.404205] Code: Bad RIP value.
[ 222.404207] RSP: 002b:00007fed92ae9c90 EFLAGS: 00010202
[ 222.404209] RAX: ffffffffffffff92 RBX: 00007fedb187c470 RCX: 00007fedb229ed42
[ 222.404210] RDX: 0000000000000001 RSI: 0000000000000089 RDI: 00007fedac13c354
[ 222.404211] RBP: 00007fed92ae9d50 R08: 00007fedac13c328 R09: 00000000ffffffff
[ 222.404212] R10: 00007fed92ae9cf0 R11: 0000000000000202 R12: 0000000000000001
[ 222.404213] R13: 00007fed92ae9cf0 R14: ffffffffffffff92 R15: 00007fedac13c300
[ 222.404783] Mem-Info:
[ 222.404790] active_anon:429056 inactive_anon:3457 isolated_anon:0
[ 222.404790] active_file:0 inactive_file:833 isolated_file:0
[ 222.404790] unevictable:0 dirty:0 writeback:0 unstable:0
[ 222.404790] slab_reclaimable:8052 slab_unreclaimable:24344
[ 222.404790] mapped:2375 shmem:3704 pagetables:3030 bounce:0
[ 222.404790] free:381368 free_pcp:89 free_cma:0
[ 222.404793] Node 0 active_anon:1716224kB inactive_anon:13828kB active_file:0kB inactive_file:3332kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:9500kB dirty:0kB writeback:0kB shmem:14816kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 155648kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 222.404794] Node 0 DMA free:13812kB min:308kB low:384kB high:460kB active_anon:1876kB inactive_anon:8kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15960kB managed:15876kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 222.404798] lowmem_reserve[]: 0 2674 3378 3378
[ 222.404802] Node 0 DMA32 free:1362940kB min:53260kB low:66572kB high:79884kB active_anon:1366928kB inactive_anon:216kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129152kB managed:2738564kB mlocked:0kB kernel_stack:96kB pagetables:3028kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 222.404831] lowmem_reserve[]: 0 0 703 703
[ 222.404837] Node 0 Normal free:149160kB min:14012kB low:17512kB high:21012kB active_anon:348116kB inactive_anon:13604kB active_file:0kB inactive_file:3080kB unevictable:0kB writepending:0kB present:1048576kB managed:720644kB mlocked:0kB kernel_stack:7504kB pagetables:9088kB bounce:0kB free_pcp:376kB local_pcp:12kB free_cma:0kB
[ 222.404841] lowmem_reserve[]: 0 0 0 0
[ 222.404859] Node 0 DMA: 1*4kB (M) 0*8kB 1*16kB (U) 7*32kB (UM) 4*64kB (U) 4*128kB (UM) 2*256kB (U) 0*512kB 0*1024kB 0*2048kB 3*4096kB (ME) = 13812kB
[ 222.405114] Node 0 DMA32: 37*4kB (U) 29*8kB (UM) 20*16kB (UM) 30*32kB (UME) 28*64kB (UME) 11*128kB (UME) 9*256kB (UME) 8*512kB (UM) 6*1024kB (UME) 10*2048kB (ME) 326*4096kB (UM) = 1373180kB
[ 222.405423] Node 0 Normal: 512*4kB (U) 1075*8kB (UM) 1667*16kB (UM) 1226*32kB (UME) 497*64kB (UME) 209*128kB (UME) 50*256kB (UM) 0*512kB 0*1024kB 0*2048kB 1*4096kB (M) = 152008kB
[ 222.405797] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 222.405799] 4655 total pagecache pages
[ 222.405801] 0 pages in swap cache
[ 222.405802] Swap cache stats: add 0, delete 0, find 0/0
[ 222.405803] Free swap = 0kB
[ 222.405803] Total swap = 0kB
[ 222.405835] 1048422 pages RAM
[ 222.405837] 0 pages HighMem/MovableOnly
[ 222.405838] 179651 pages reserved
[ 222.405839] 0 pages cma reserved
[ 222.405840] 0 pages hwpoisoned
[ 222.405843] Out of memory: Kill process 1074 (java) score 50 or sacrifice child
[ 222.406136] Killed process 1074 (java) total-vm:5555688kB, anon-rss:174244kB, file-rss:0kB, shmem-rss:0kB
[ 222.443446] oom_reaper: reaped process 1748 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB