RE: 5.1 kernel: khugepaged stuck at 100%

From: Justin Piszcz
Date: Wed May 15 2019 - 03:27:18 EST




> -----Original Message-----
> From: Kirill A. Shutemov [mailto:kirill@xxxxxxxxxxxxx]
> Sent: Thursday, May 09, 2019 7:14 AM
> To: Justin Piszcz
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: 5.1 kernel: khugepaged stuck at 100%
>
> On Thu, May 09, 2019 at 05:54:56AM -0400, Justin Piszcz wrote:

[ ... ]

> > Thoughts on debugging / before reboot to clear this up?
>
> Could you check what khugepaged doing?
>
> cat /proc/$(pidof khugepaged)/stack

Hello, the next time I see this (khugepage at 100%) I will provide the
requested info.

However, this time I'm getting a kernel read fault (below) under 5.1.1,
under 5.0.x and earlier I've never seen any of these errors before:

[54101.281515] BUG: unable to handle kernel paging request at
ffffea0002060000
[54101.281519] #PF error: [normal kernel read fault]
[54101.281521] PGD 103ffee067 P4D 103ffee067 PUD 103ffed067 PMD 0
[54101.281524] Oops: 0000 [#1] SMP PTI
[54101.281527] CPU: 0 PID: 77 Comm: khugepaged Tainted: G T
5.1.1 #1
[54101.281528] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2
01/16/2015
[54101.281534] RIP: 0010:isolate_freepages_block+0xb9/0x310
[54101.281535] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49 8d 45
79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef 00 00 00 <48>
8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01 0f 84
[54101.281537] RSP: 0018:ffffc900003a7860 EFLAGS: 00010246
[54101.281538] RAX: 0000000000000000 RBX: ffffea0002060000 RCX:
ffffc900003a7b69
[54101.281539] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffffffff828c4790
[54101.281540] RBP: 0000000000081800 R08: 0000000000000001 R09:
0000000000000000
[54101.281542] R10: 0000000000000202 R11: ffffffff828c43d0 R12:
0000000000000000
[54101.281543] R13: 0000000000000000 R14: ffffc900003a7af0 R15:
0000000000000000
[54101.281544] FS: 0000000000000000(0000) GS:ffff88903f800000(0000)
knlGS:0000000000000000
[54101.281546] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54101.281547] CR2: ffffea0002060000 CR3: 000000000280e002 CR4:
00000000001606f0
[54101.281548] Call Trace:
[54101.281552] compaction_alloc+0x83d/0x8d0
[54101.281555] migrate_pages+0x30d/0x750
[54101.281557] ? isolate_migratepages_block+0xa10/0xa10
[54101.281559] ? __reset_isolation_suitable+0x110/0x110
[54101.281561] compact_zone+0x684/0xa70
[54101.281563] compact_zone_order+0x109/0x150
[54101.281566] ? schedule_timeout+0x1ba/0x290
[54101.281569] ? record_times+0x13/0xa0
[54101.281571] try_to_compact_pages+0x10d/0x220
[54101.281574] __alloc_pages_direct_compact+0x93/0x180
[54101.281576] __alloc_pages_nodemask+0x6c7/0xe20
[54101.281579] ? __wake_up_common_lock+0xb0/0xb0
[54101.281581] khugepaged+0x31f/0x19b0
[54101.281584] ? __wake_up_common_lock+0xb0/0xb0
[54101.281586] ? collapse_shmem.isra.4+0xc20/0xc20
[54101.281589] kthread+0x10a/0x120
[54101.281592] ? __kthread_create_on_node+0x1b0/0x1b0
[54101.281595] ret_from_fork+0x35/0x40
[54101.281597] CR2: ffffea0002060000
[54101.281599] ---[ end trace d9bfd1c296893a64 ]---
[54101.281601] RIP: 0010:isolate_freepages_block+0xb9/0x310
[54101.281602] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49 8d 45
79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef 00 00 00 <48>
8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01 0f 84
[54101.281604] RSP: 0018:ffffc900003a7860 EFLAGS: 00010246
[54101.281605] RAX: 0000000000000000 RBX: ffffea0002060000 RCX:
ffffc900003a7b69
[54101.281606] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffffffff828c4790
[54101.281607] RBP: 0000000000081800 R08: 0000000000000001 R09:
0000000000000000
[54101.281608] R10: 0000000000000202 R11: ffffffff828c43d0 R12:
0000000000000000
[54101.281609] R13: 0000000000000000 R14: ffffc900003a7af0 R15:
0000000000000000
[54101.281610] FS: 0000000000000000(0000) GS:ffff88903f800000(0000)
knlGS:0000000000000000
[54101.281611] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54101.281613] CR2: ffffea0002060000 CR3: 000000000280e002 CR4:
00000000001606f0