Re: [PATCH v2 0/8] Return head pages from find_*_entry

From: Naresh Kamboju
Date: Tue Sep 15 2020 - 09:01:20 EST


On Tue, 15 Sep 2020 at 13:56, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>
> On Thu, 10 Sep 2020, Matthew Wilcox (Oracle) wrote:
>
> > This patch series started out as part of the THP patch set, but it has
> > some nice effects along the way and it seems worth splitting it out and
> > submitting separately.
> >
> > Currently find_get_entry() and find_lock_entry() return the page
> > corresponding to the requested index, but the first thing most callers do
> > is find the head page, which we just threw away. As part of auditing
> > all the callers, I found some misuses of the APIs and some plain
> > inefficiencies that I've fixed.
> >
> > The diffstat is unflattering, but I added more kernel-doc and a new wrapper.
> >
> > v2:
> > - Rework how shmem_getpage_gfp() handles getting a head page back from
> > find_lock_entry()
> > - Renamed find_get_swap_page() to find_get_incore_page()
> > - Make sure find_get_incore_page() doesn't return a head page
> > - Fix the missing include of linux/shmem_fs.h
> > - Move find_get_entry and find_lock_entry prototypes to mm/internal.h
> > - Rename thp_valid_index() to thp_contains()
> > - Fix thp_contains() for hugetlbfs and swapcache
> > - Add find_lock_head() wrapper around pagecache_get_page()
> >
> > Matthew Wilcox (Oracle) (8):
> > mm: Factor find_get_incore_page out of mincore_page
> > mm: Use find_get_incore_page in memcontrol
> > mm: Optimise madvise WILLNEED
> > proc: Optimise smaps for shmem entries
> > i915: Use find_lock_page instead of find_lock_entry
> > mm: Convert find_get_entry to return the head page
> > mm/shmem: Return head page from find_lock_entry
> > mm: Add find_lock_head

While running kselftest mincore tests the following kernel BUG reported on the
linux next-20200915 tag on x86_64, i386 and arm64.

metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git commit: 6b02addb1d1748d21dd1261e46029b264be4e5a0
git describe: next-20200915
make_kernelversion: 5.9.0-rc5
kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-corei7-64/lkft/linux-next/860/config

Test case:
---------------

* Tests the user interface. This test triggers most of the documented
* error conditions in mincore().
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/plain/tools/testing/selftests/mincore/mincore_selftest.c

kernel BUG:
-----------------
[ 710.472860] kselftest: Running tests in mincore
[ 710.554790] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 710.561765] #PF: supervisor read access in kernel mode
[ 710.566920] #PF: error_code(0x0000) - not-present page
[ 710.572065] PGD 80000003fd5b9067 P4D 80000003fd5b9067 PUD 45903f067 PMD 0
[ 710.578957] Oops: 0000 [#1] SMP PTI
[ 710.582450] CPU: 0 PID: 19025 Comm: mincore_selftes Tainted: G
W K 5.9.0-rc5-next-20200915 #1
[ 710.592094] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 710.599574] RIP: 0010:PageHuge+0x6/0x40
[ 710.603411] Code: c3 0f 1f 00 0f 1f 44 00 00 55 48 89 d6 48 89 e5
e8 ef fe ff ff 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
00 00 55 <48> 8b 07 48 89 e5 a9 00 00 01 00 75 09 48 8b 47 08 83 e0 01
74 17
[ 710.622149] RSP: 0018:ffffb0e2002bfcc0 EFLAGS: 00010246
[ 710.627373] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 710.634498] RDX: ffff9f231b2518c0 RSI: ffffffffa272b340 RDI: 0000000000000000
[ 710.641620] RBP: ffffb0e2002bfce8 R08: 0000000000000002 R09: 0000000000000000
[ 710.648747] R10: ffffb0e2002bfb20 R11: ffffffffa272b340 R12: ffff9f23193c5e68
[ 710.655876] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
[ 710.663003] FS: 00007fa4c9ea24c0(0000) GS:ffff9f231fc00000(0000)
knlGS:0000000000000000
[ 710.671088] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 710.676824] CR2: 0000000000000000 CR3: 00000004044d0004 CR4: 00000000003706f0
[ 710.683949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 710.691073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 710.698196] Call Trace:
[ 710.700644] ? find_get_incore_page+0xc6/0x120
[ 710.705089] mincore_page+0x12/0x60
[ 710.708580] __mincore_unmapped_range+0x78/0xc0
[ 710.713105] mincore_pte_range+0x269/0x300
[ 710.717206] __walk_page_range+0x5ab/0xb60
[ 710.721308] walk_page_range+0xab/0x150
[ 710.725152] __x64_sys_mincore+0x13c/0x330
[ 710.729251] do_syscall_64+0x37/0x50
[ 710.732831] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 710.737882] RIP: 0033:0x7fa4c99be2d7
[ 710.741462] Code: 73 01 c3 48 8b 0d c1 fb 2b 00 f7 d8 64 89 01 48
83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 1b 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 fb 2b 00 f7 d8 64 89
01 48
[ 710.760206] RSP: 002b:00007ffcb103baf8 EFLAGS: 00000203 ORIG_RAX:
000000000000001b
[ 710.767770] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa4c99be2d7
[ 710.774897] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007fa4c9ea6000
[ 710.782020] RBP: 00007ffcb103bc40 R08: 00000000ffffffff R09: 0000000000000000
[ 710.789144] R10: 0000000000000021 R11: 0000000000000203 R12: 0000000000400d00
[ 710.796268] R13: 00007ffcb103be10 R14: 0000000000000000 R15: 0000000000000000
[ 710.803395] Modules linked in: sch_fq 8021q iptable_filter xt_mark
ip_tables cls_bpf sch_ingress veth algif_hash x86_pkg_temp_thermal
fuse [last unloaded: memory_notifier_error_inject]
[ 710.819814] CR2: 0000000000000000
[ 710.823128] ---[ end trace 67d1a6d0ea1b24e3 ]---
[ 710.827746] RIP: 0010:PageHuge+0x6/0x40
[ 710.831584] Code: c3 0f 1f 00 0f 1f 44 00 00 55 48 89 d6 48 89 e5
e8 ef fe ff ff 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
00 00 55 <48> 8b 07 48 89 e5 a9 00 00 01 00 75 09 48 8b 47 08 83 e0 01
74 17
[ 710.850322] RSP: 0018:ffffb0e2002bfcc0 EFLAGS: 00010246
[ 710.855546] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 710.862672] RDX: ffff9f231b2518c0 RSI: ffffffffa272b340 RDI: 0000000000000000
[ 710.869803] RBP: ffffb0e2002bfce8 R08: 0000000000000002 R09: 0000000000000000
[ 710.876928] R10: ffffb0e2002bfb20 R11: ffffffffa272b340 R12: ffff9f23193c5e68
[ 710.884050] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
[ 710.891175] FS: 00007fa4c9ea24c0(0000) GS:ffff9f231fc00000(0000)
knlGS:0000000000000000
[ 710.899253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 710.904990] CR2: 0000000000000000 CR3: 00000004044d0004 CR4: 00000000003706f0
[ 710.912113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 710.919236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 710.926360] note: mincore_selftes[19025] exited with preempt_count 1
[ 710.932704] BUG: sleeping function called from invalid context at
/usr/src/kernel/include/linux/percpu-rwsem.h:49
[ 710.942950] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid:
19025, name: mincore_selftes
[ 710.951631] INFO: lockdep is turned off.
[ 710.955551] irq event stamp: 190
[ 710.958785] hardirqs last enabled at (189): [<ffffffffa0bda53c>]
get_page_from_freelist+0x24c/0x14b0
[ 710.967995] hardirqs last disabled at (190): [<ffffffffa18c7921>]
irqentry_enter+0x21/0x50
[ 710.976247] softirqs last enabled at (36): [<ffffffffa1c00308>]
__do_softirq+0x308/0x42a
[ 710.984419] softirqs last disabled at (11): [<ffffffffa1a00f82>]
asm_call_on_stack+0x12/0x20
[ 710.992852] CPU: 0 PID: 19025 Comm: mincore_selftes Tainted: G
D W K 5.9.0-rc5-next-20200915 #1
[ 711.002496] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 711.009967] Call Trace:
[ 711.012412] dump_stack+0x7d/0x9f
[ 711.015724] ___might_sleep+0x163/0x250
[ 711.019562] __might_sleep+0x4a/0x80
[ 711.023141] exit_signals+0x33/0x2f0
[ 711.026713] do_exit+0xa9/0xcb0
[ 711.029858] ? __x64_sys_mincore+0x13c/0x330
[ 711.034125] rewind_stack_do_exit+0x17/0x20
[ 711.038308] RIP: 0033:0x7fa4c99be2d7
[ 711.041888] Code: 73 01 c3 48 8b 0d c1 fb 2b 00 f7 d8 64 89 01 48
83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 1b 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 fb 2b 00 f7 d8 64 89
01 48
[ 711.060633] RSP: 002b:00007ffcb103baf8 EFLAGS: 00000203 ORIG_RAX:
000000000000001b
[ 711.068197] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa4c99be2d7
[ 711.075321] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007fa4c9ea6000
[ 711.082445] RBP: 00007ffcb103bc40 R08: 00000000ffffffff R09: 0000000000000000
[ 711.089570] R10: 0000000000000021 R11: 0000000000000203 R12: 0000000000400d00
[ 711.096693] R13: 00007ffcb103be10 R14: 0000000000000000 R15: 0000000000000000
[ 737.104310] rcu: INFO: rcu_sched self-detected stall on CPU
[ 737.109887] rcu: 0-....: (26000 ticks this GP)
idle=1a2/1/0x4000000000000000 softirq=102446/102446 fqs=6463
[ 737.119792] (t=26016 jiffies g=173197 q=1522)
[ 737.124238] NMI backtrace for cpu 0
[ 737.127731] CPU: 0 PID: 19025 Comm: mincore_selftes Tainted: G
D W K 5.9.0-rc5-next-20200915 #1
[ 737.137376] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 737.144856] Call Trace:
[ 737.147309] <IRQ>
[ 737.149330] dump_stack+0x7d/0x9f
[ 737.152649] nmi_cpu_backtrace+0xa4/0xc0
[ 737.156574] ? lapic_can_unplug_cpu+0xa0/0xa0
[ 737.160931] nmi_trigger_cpumask_backtrace+0x97/0xd0
[ 737.165889] arch_trigger_cpumask_backtrace+0x19/0x20
[ 737.170933] rcu_dump_cpu_stacks+0xbc/0xec
[ 737.175025] rcu_sched_clock_irq+0x729/0x9c0
[ 737.179296] ? account_system_index_time+0x112/0x1f0
[ 737.184256] ? tick_sched_do_timer+0x60/0x60
[ 737.188528] update_process_times+0x28/0x60
[ 737.192711] tick_sched_handle.isra.21+0x34/0x50
[ 737.197322] tick_sched_timer+0x6d/0x80
[ 737.201154] __hrtimer_run_queues+0x1d0/0x450
[ 737.205514] hrtimer_interrupt+0xe7/0x240
[ 737.209527] __sysvec_apic_timer_interrupt+0x79/0x1f0
[ 737.214578] asm_call_on_stack+0x12/0x20
[ 737.218504] </IRQ>
[ 737.220611] sysvec_apic_timer_interrupt+0x75/0xa0
[ 737.225402] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 737.230541] RIP: 0010:queued_spin_lock_slowpath+0x41/0x1a0
[ 737.236017] Code: f6 85 f6 75 3e f0 0f ba 2f 08 0f 92 c0 0f b6 c0
c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 18 85 c0 75 04 eb
08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f6 c4 01 75
04 c6
[ 737.254755] RSP: 0018:ffffb0e2002bfc48 EFLAGS: 00000202
[ 737.259980] RAX: 0000000000000101 RBX: ffff9f23167158c0 RCX: 0000000000000000
[ 737.267103] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f23167158c0
[ 737.274230] RBP: ffffb0e2002bfc48 R08: 0000000000000001 R09: 0000000000000000
[ 737.281353] R10: 0000000000000000 R11: 0000000000000000 R12: 00007fa4c9ea1000
[ 737.288478] R13: 00007fa4c9ea1000 R14: ffffb0e2002bfe20 R15: 00007fa4c9ea1000
[ 737.295605] do_raw_spin_lock+0xb6/0xc0
[ 737.299440] _raw_spin_lock+0x37/0x40
[ 737.303098] ? unmap_page_range+0x4a6/0xd00
[ 737.307275] unmap_page_range+0x4a6/0xd00
[ 737.311294] unmap_single_vma+0x7d/0xf0
[ 737.315138] unmap_vmas+0xd4/0x160
[ 737.318544] exit_mmap+0xb1/0x1c0
[ 737.321866] mmput+0x6a/0x130
[ 737.324843] do_exit+0x359/0xcb0
[ 737.328076] rewind_stack_do_exit+0x17/0x20
[ 737.332260] RIP: 0033:0x7fa4c99be2d7
[ 737.335831] Code: Bad RIP value.
[ 737.339054] RSP: 002b:00007ffcb103baf8 EFLAGS: 00000203 ORIG_RAX:
000000000000001b
[ 737.346613] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa4c99be2d7
[ 737.353743] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007fa4c9ea6000
[ 737.360867] RBP: 00007ffcb103bc40 R08: 00000000ffffffff R09: 0000000000000000
[ 737.367992] R10: 0000000000000021 R11: 0000000000000203 R12: 0000000000400d00
[ 737.375116] R13: 00007ffcb103be10 R14: 0000000000000000 R15: 0000000000000000
[ 815.107312] rcu: INFO: rcu_sched self-detected stall on CPU
[ 815.112890] rcu: 0-....: (103727 ticks this GP)
idle=1a2/1/0x4000000000000000 softirq=102446/102446 fqs=25897
[ 815.122966] (t=104019 jiffies g=173197 q=1545)
[ 815.127492] NMI backtrace for cpu 0
[ 815.130985] CPU: 0 PID: 19025 Comm: mincore_selftes Tainted: G
D W K 5.9.0-rc5-next-20200915 #1
[ 815.140628] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 815.148099] Call Trace:
[ 815.150542] <IRQ>
[ 815.152556] dump_stack+0x7d/0x9f
[ 815.155876] nmi_cpu_backtrace+0xa4/0xc0
[ 815.159799] ? lapic_can_unplug_cpu+0xa0/0xa0
[ 815.164150] nmi_trigger_cpumask_backtrace+0x97/0xd0
[ 815.169108] arch_trigger_cpumask_backtrace+0x19/0x20
[ 815.174151] rcu_dump_cpu_stacks+0xbc/0xec
[ 815.178245] rcu_sched_clock_irq+0x729/0x9c0
[ 815.182515] ? account_system_index_time+0x112/0x1f0
[ 815.187473] ? tick_sched_do_timer+0x60/0x60
[ 815.191744] update_process_times+0x28/0x60
[ 815.195922] tick_sched_handle.isra.21+0x34/0x50
[ 815.200533] tick_sched_timer+0x6d/0x80
[ 815.204365] __hrtimer_run_queues+0x1d0/0x450
[ 815.208725] hrtimer_interrupt+0xe7/0x240
[ 815.212738] __sysvec_apic_timer_interrupt+0x79/0x1f0
[ 815.217789] asm_call_on_stack+0x12/0x20
[ 815.221713] </IRQ>
[ 815.223811] sysvec_apic_timer_interrupt+0x75/0xa0
[ 815.228597] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 815.233734] RIP: 0010:queued_spin_lock_slowpath+0x41/0x1a0

full test log link,
https://lkft.validation.linaro.org/scheduler/job/1765602#L12129
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20200915/testrun/3198585/suite/linux-log-parser/test/check-kernel-warning-1765604/log
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20200915/testrun/3198610/suite/linux-log-parser/test/check-kernel-oops-1765633/log

Reported-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>

--
Linaro LKFT
https://lkft.linaro.org