[BUG] BUG: scheduling while atomic in throttle_direct_reclaim

From: Xianying Wang
Date: Mon May 26 2025 - 11:49:49 EST


Hi,

I discovered a kernel crash described as "BUG: scheduling while atomic
in throttle_direct_reclaim." This issue occurs in the memory reclaim
path, specifically in the throttle_direct_reclaim function
(mm/vmscan.c), where the kernel attempts to perform a potentially
blocking operation (schedule_timeout) while still in an atomic or
non-preemptible context, leading to an invalid scheduling state and
triggering __schedule_bug().

The crash trace shows that this condition can occur when the kernel
mounts a specially crafted ISO9660 image via syz_mount_image$iso9660.
During image parsing, the VFS initiates page readahead through
read_pages, which issues block I/O backed by a loop device. This leads
to a SCSI read path where scsi_alloc_sgtables
(drivers/scsi/scsi_lib.c) attempts to allocate memory for a
scatterlist using mempool_alloc. If memory pressure is present,
mempool_alloc triggers try_to_free_pages, and subsequently
throttle_direct_reclaim.

At this point, the kernel is likely in an atomic context due to
earlier direct reclaim or preemption disabling within the block layer
or SCSI stack. As a result, schedule_timeout is not allowed and
triggers a BUG.

I recommend reviewing the reclaim context propagation in:

scsi_alloc_sgtables and sg_alloc_table_chained
mempool_alloc in SCSI I/O paths
throttle_direct_reclaim to ensure blocking calls are not made from
atomic contexts

This can be reproduced on:

HEAD commit:

commit e8f897f4afef0031fe618a8e94127a0934896aba

report: https://pastebin.com/raw/bxuLHCgu

console output : https://pastebin.com/raw/mCZ4Ap8Q

kernel config : https://pastebin.com/raw/aJ9rUnhG

C reproducer : https://pastebin.com/raw/1dku01DG

Best regards,

Xianying