Re: use-after-free in bfq_idle_slice_timer()

From: Guenter Roeck
Date: Wed May 15 2019 - 21:50:25 EST


Hi Paolo,

On 5/15/19 11:20 AM, Paolo Valente wrote:


Il giorno 9 mag 2019, alle ore 18:17, Guenter Roeck <linux@xxxxxxxxxxxx> ha scritto:

Hi Paolo,

On Thu, May 09, 2019 at 05:14:57PM +0200, Paolo Valente wrote:
I couldn't find any explanation. Would you be willing to apply a
patch that adds runtime checks in bfq? It'd take me a few days to
prepare it.


Sure, we can do that.


Ok, (compressed) patch attached. It applies cleanly on top of current mainline HEAD, so it should apply on top of your branch too. I'm sorry, but I couldn't test the patch thoroughly, because the current mainline kernel systematically corrupts the fs on my main test machine. If the patch does introduce unwanted crashes, I'll fix it.

Outch.

Looking forward to your feedback,

Unfortunately I'll be off on vacation starting tomorrow and all next week.
I'll only be able to look into this after I am back.

Guenter

Paolo




Thanks,
Guenter

Thanks,
Paolo

Il giorno 8 mag 2019, alle ore 21:50, Guenter Roeck <linux@xxxxxxxxxxxx> ha scritto:

Hi,

we have seen the attached use-after-free in bfq_idle_slice_timer()
while running reboot tests. The code in these tests includes all
bfq patches up to and including commit eed47d19d936 ("block, bfq:
fix use after free in bfq_bfqq_expire").

gdb points to the dereference of struct bfq_queue *bfqq in
bfq_bfqq_budget_timeout(). My suspicion is that the cleanup path
in bfq_put_queue() may not clear bfqd->in_service_queue, but I don't
understand the code well enough to be sure.

Any thoughts / comments ?

Thanks,
Guenter

---
[ 25.307269] ==================================================================
[ 25.314555] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0x88/0x1d4
[ 25.321359] Read of size 8 at addr fffffff089360440 by task swapper/0/0
[ 25.327975]
[ 25.329487] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S 4.19.38 #37
[ 25.336974] Hardware name: <...> rev2 board (DT)
[ 25.342378] Call trace:
[ 25.344844] dump_backtrace+0x0/0x358
[ 25.348521] show_stack+0x20/0x2c
[ 25.351849] dump_stack+0x130/0x19c
[ 25.355353] print_address_description+0x74/0x250
[ 25.360068] kasan_report+0x27c/0x2a0
[ 25.363742] __asan_report_load8_noabort+0x2c/0x38
[ 25.368546] bfq_idle_slice_timer+0x88/0x1d4
[ 25.372829] __hrtimer_run_queues+0x794/0xa34
[ 25.377197] hrtimer_interrupt+0x278/0x600
[ 25.381310] arch_timer_handler_phys+0x5c/0x6c
[ 25.385767] handle_percpu_devid_irq+0x2e4/0x754
[ 25.390399] __handle_domain_irq+0xd4/0x158
[ 25.394593] gic_handle_irq+0x208/0x260
[ 25.398437] el1_irq+0xb0/0x128
[ 25.401591] arch_cpu_idle+0x20c/0x548
[ 25.405352] do_idle+0x184/0x4dc
[ 25.408590] cpu_startup_entry+0x24/0x28
[ 25.412526] rest_init+0x114/0x148
[ 25.415939] start_kernel+0x4c8/0x5c4
[ 25.419605]
[ 25.421105] Allocated by task 1430:
[ 25.424606] kasan_kmalloc+0xe0/0x1ac
[ 25.428279] kasan_slab_alloc+0x14/0x1c
[ 25.432127] kmem_cache_alloc+0x178/0x278
[ 25.436149] bfq_get_queue+0x160/0x650
[ 25.439911] bfq_get_bfqq_handle_split+0xcc/0x2fc
[ 25.444627] bfq_init_rq+0x254/0x18c0
[ 25.448301] bfq_insert_requests+0x5d0/0x1048
[ 25.452669] blk_mq_sched_insert_requests+0x130/0x204
[ 25.457734] blk_mq_flush_plug_list+0x844/0x91c
[ 25.462278] blk_flush_plug_list+0x3e4/0x778
[ 25.466559] blk_finish_plug+0x54/0x78
[ 25.470322] read_pages+0x294/0x2f0
[ 25.473824] __do_page_cache_readahead+0x1a4/0x354
[ 25.478628] filemap_fault+0x8ec/0xbb4
[ 25.482389] ext4_filemap_fault+0x84/0xa4
[ 25.486409] __do_fault+0x128/0x338
[ 25.489909] handle_mm_fault+0x1de0/0x2588
[ 25.494017] do_page_fault+0x464/0x8d8
[ 25.497777] do_translation_fault+0x6c/0x88
[ 25.501969] do_mem_abort+0xd8/0x2d0
[ 25.505554] do_el0_ia_bp_hardening+0x13c/0x1a8
[ 25.510094] el0_ia+0x18/0x1c
[ 25.513065]
[ 25.514562] Freed by task 1430:
[ 25.517715] __kasan_slab_free+0x13c/0x21c
[ 25.521821] kasan_slab_free+0x10/0x1c
[ 25.525582] kmem_cache_free+0x7c/0x5f8
[ 25.529429] bfq_put_queue+0x19c/0x2e4
[ 25.533191] bfq_exit_icq_bfqq+0x108/0x228
[ 25.537299] bfq_exit_icq+0x20/0x38
[ 25.540798] ioc_exit_icq+0xe4/0x16c
[ 25.544384] put_io_context_active+0x174/0x234
[ 25.548836] exit_io_context+0x84/0x94
[ 25.552599] do_exit+0x13b4/0x18e4
[ 25.556013] do_group_exit+0x1cc/0x204
[ 25.559775] __wake_up_parent+0x0/0x5c
[ 25.563537] __se_sys_exit_group+0x0/0x24
[ 25.567558] el0_svc_common+0x124/0x1ec
[ 25.571407] el0_svc_compat_handler+0x84/0xb0
[ 25.575774] el0_svc_compat+0x8/0x18
[ 25.579351]
[ 25.580854] The buggy address belongs to the object at fffffff089360338
[ 25.580854] which belongs to the cache bfq_queue of size 464
[ 25.587350] cros-ec-spi spi2.0: SPI transfer timed out
[ 25.593209] The buggy address is located 264 bytes inside of
[ 25.593209] 464-byte region [fffffff089360338, fffffff089360508)
[ 25.593216] The buggy address belongs to the page:
[ 25.593234] page:ffffffbfc224d800 count:1 mapcount:0 mapping:fffffff09916d880 index:0xfffffff089360cc8
[ 25.598388] cros-ec-spi spi2.0: spi transfer failed: -110
[ 25.610092] compound_mapcount: 0
[ 25.610109] flags: 0x4000000000008100(slab|head)
[ 25.610133] raw: 4000000000008100 ffffffbfc14c1908 fffffff0991668a0 fffffff09916d880
[ 25.615021] cros-ec-spi spi2.0: Command xfer error (err:-110)
[ 25.624225] raw: fffffff089360cc8 000000000014000d 00000001ffffffff 0000000000000000
[ 25.624233] page dumped because: kasan: bad access detected
[ 25.624237]
[ 25.624243] Memory state around the buggy address:
[ 25.624257] fffffff089360300: fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb fb
[ 25.624271] fffffff089360380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 25.630223] cros-ec-i2c-tunnel 11012000.spi:ec@0:i2c-tunnel: Error transferring EC i2c message -110
[ 25.632977] >fffffff089360400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 25.632985] ^
[ 25.632997] fffffff089360480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 25.633010] fffffff089360500: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 25.633016] ==================================================================