[6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110

From: Bruno Goncalves
Date: Tue Feb 14 2023 - 08:24:24 EST


Hello,

recently when testing kernel with debug options set from net-next [1]
and bpf-next [2] the following call trace happens:

[ 92.539335] be2net 0000:04:00.0: FW config: function_mode=0x10003,
function_caps=0x7
[ 92.559345] scsi host1: BC_356 : error in cmd completion: Subsystem
: 1 Opcode : 191 status(compl/extd)=2/30
[ 92.560448] scsi host1: BG_1597 : HBA error recovery not supported
[ 92.587657] be2net 0000:04:00.0: Max: txqs 16, rxqs 17, rss 16, eqs 16, vfs 0
[ 92.588471] be2net 0000:04:00.0: Max: uc-macs 30, mc-macs 64, vlans 64
[ 93.731235] be2net 0000:04:00.0: enabled 8 MSI-x vector(s) for NIC
[ 93.749741] ==================================================================
[ 93.750521] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110
[ 93.751233] Read of size 8 at addr ffff888104719758 by task kworker/0:2/108
[ 93.751601]
[ 93.752087] CPU: 0 PID: 108 Comm: kworker/0:2 Tainted: G I
6.2.0-rc7 #1
[ 93.752549] Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
[ 93.752884] Workqueue: events work_for_cpu_fn
[ 93.753510] Call Trace:
[ 93.753687] <TASK>
[ 93.754215] dump_stack_lvl+0x55/0x71
[ 93.754449] print_report+0x184/0x4b1
[ 93.754697] ? __virt_addr_valid+0xe8/0x160
[ 93.754972] ? hop_cmp+0x26/0x110
[ 93.755533] kasan_report+0xa5/0xe0
[ 93.756193] ? hop_cmp+0x26/0x110
[ 93.756767] ? __pfx_hop_cmp+0x10/0x10
[ 93.756990] ? hop_cmp+0x26/0x110
[ 93.757556] ? __pfx_hop_cmp+0x10/0x10
[ 93.757774] ? bsearch+0x53/0x80
[ 93.758838] ? sched_numa_find_nth_cpu+0x128/0x360
[ 93.759492] ? __pfx_sched_numa_find_nth_cpu+0x10/0x10
[ 93.759792] ? alloc_cpumask_var_node+0x38/0x60
[ 93.760419] ? rcu_read_lock_sched_held+0x3f/0x80
[ 93.761060] ? trace_kmalloc+0x33/0xf0
[ 93.761306] ? __kmalloc_node+0x76/0xc0
[ 93.761528] ? cpumask_local_spread+0x44/0xc0
[ 93.762192] ? be_setup_queues+0x13b/0x3c0 [be2net]
[ 93.762957] ? be_setup+0x663/0xa60 [be2net]
[ 93.763795] ? __pfx_be_setup+0x10/0x10 [be2net]
[ 93.764523] ? is_module_address+0x2b/0x50
[ 93.764744] ? is_module_address+0x2b/0x50
[ 93.764996] ? static_obj+0x6b/0x80
[ 93.765865] ? lockdep_init_map_type+0xcf/0x370
[ 93.766527] ? be_probe+0x825/0xcd0 [be2net]
[ 93.767224] ? __pfx_be_probe+0x10/0x10 [be2net]
[ 93.767932] ? preempt_count_sub+0xb7/0x100
[ 93.768181] ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 93.768450] ? __pfx_be_probe+0x10/0x10 [be2net]
[ 93.769162] ? local_pci_probe+0x77/0xc0
[ 93.769392] ? __pfx_local_pci_probe+0x10/0x10
[ 93.770007] ? work_for_cpu_fn+0x29/0x40
[ 93.770253] ? process_one_work+0x543/0xa20
[ 93.770490] ? __pfx_process_one_work+0x10/0x10
[ 93.797773pin_lock+0x10/0x10
[ 93.871656] ? __list_add_valid+0x3f/0x70
[ 93.871874] ? move_linked_works+0x103/0x140
[ 93.872487] ? worker_thread+0x364/0x630
[ 93.872704] ? __kthread_parkme+0xd8/0xf0
[ 93.872919] ? __pfx_worker_thread+0x10/0x10
[ 93.873513] ? kthread+0x17e/0x1b0
[ 93.874055] ? __pfx_kthread+0x10/0x10
[ 93.874290] ? ret_from_fork+0x2c/0x50
[ 93.874541] </TASK>
[ 93.874727]
[ 93.875188] Allocated by task 1:
[ 93.875733] kasan_save_stack+0x34/0x60
[ 93.875942] kasan_set_track+0x21/0x30
[ 93.876164] __kasan_kmalloc+0xa9/0xb0
[ 93.876373] __kmalloc+0x57/0xd0
[ 93.876918] sched_init_numa+0x21f/0x7e0
[ 93.877146] sched_init_smp+0x6d/0x113
[ 93.877358] kernel_init_freeable+0x2a3/0x4a0
[ 93.877993] kernel_init+0x18/0x160
[ 93.878592] ret_from_fork+0x2c/0x50
[ 93.878811]
[ 93.879278] The buggy address belongs to the object at ffff888104719760
[ 93.879278] which belongs to the cache kmalloc-16 of size 16
[ 93.879926] The buggy address is located 8 bytes to the left of
[ 93.879926] 16-byte region [ffff888104719760, ffff888104719770)
[ 94.363686] flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
[ 94.381131] raw: 0017ffffc0000200 ffff88810004c580 ffffea000400df50
ffffea0004165190
[ 94.381554] raw: 0000000000000000 00000000001c001c 00000001ffffffff
0000000000000000
[ 94.381958] page dumped because: kasan: bad access detected
[ 94.382249]
[ 94.382710] Memory state around the buggy address:
[ 94.383319] ffff888104719600: fc fc fc fc fc fc fc fc fa fb fc fc
fc fc fc fc
[ 94.384066] ffff888104719680: fc fc fc fc fc fc fc fc fc fc 00 00
fc fc fc fc
[ 94.384841] >ffff888104719700: fc fc fc fc fc fc fc fc fc fc fc fc
00 00 fc fc
[ 94.385573] ^
[ 94.386251] ffff888104719780: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc 00 00
[ 94.386989] ffff888104719800: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 94.387710] ==================================================================

full console log:
https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/3762562309/redhat:776235046/build_x86_64_redhat:776235046-x86_64-kernel-debug/tests/1/results_0001/job.01/recipes/13385613/tasks/5/logs/test_console.log

test logs: https://datawarehouse.cki-project.org/kcidb/tests/7075911
cki issue tracker: https://datawarehouse.cki-project.org/issue/1896

kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/776235046/build%20x86_64%20debug/3762562279/artifacts/kernel-bpf-next-redhat_776235046-x86_64-kernel-debug.config
kernel tarball:
https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/776235046/publish%20x86_64%20debug/3762562289/artifacts/kernel-bpf-next-redhat_776235046-x86_64-kernel-debug.tar.gz

The first commit we tested that we hit the problem is [3], but we
didn't bisect it to know what commit introduced the issue.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
[2] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
[3] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=0243d3dfe274832aa0a16214499c208122345173

Thanks,
Bruno Goncalves