On Mon, Jun 23, 2025 at 06:22:44PM +0930, Qu Wenruo wrote:
在 2025/6/23 17:49, Byungchul Park 写道:
On Mon, Jun 23, 2025 at 03:20:43PM +0930, Qu Wenruo wrote:
在 2025/6/23 12:51, Byungchul Park 写道:
Hi folks,
Thanks to Yunseong, we got two DEPT reports in btrfs. It doesn't mean
it's obvious deadlocks, but after digging into the reports, I'm
wondering if it could happen by any chance.
1) The first scenario that I'm concerning is:
context A context B
do_truncate()
...
btrfs_do_readpage() // with folio lock held
This one is for data.
Do you mean this folio is for data? Thanks for the confirmation.
Yes, only data folios will go through btrfs_do_readpage().
For metadata, we never go through btrfs_do_readpage(), but
read_extent_buffer_pages_nowait().
do_unlinkat()
...
push_leaf_right()
btrfs_tree_lock_nested()
down_write_nested(&eb->lock) // hold
This is struct extent_buffer's rw_sem. Right?
^btrfs_get_extent()
btrfs_lookup_file_extent()
btrfs_search_slot()
down_read_nested(&eb->lock) // stuck
This one is for metadata.
I don't get this actually.
This is struct extent_buffer's rw_sem, too. Cannot this rw_sem be the
same as the rw_sem above in context A?
My bad, I thought you're talking about that down_read_nested()
conflicting with folio lock.
But if you're talking about extent_buffer::lock, then the one in context
B will wait for the one in context A, and that's expected.
Sounds good.
Data and metadata page cache will never cross into each other.
Thanks,
Qu
__push_leaf_right()
...
folio_lock() // stuck
Did you mean this folio is always for metadata?
Can you explain more on where this folio_lock() comes from?
I should also rely on the following stacktrace in the dept report. I
asked Yunseong who reported this issue, for the decoded stacktrace, so
that I can interpret that better. I will get back once I figure out
where the wait on PG_locked comes from.
[ 304.344198][ T7488] [W] dept_page_wait_on_bit(pg_locked_map:0):
[ 304.344211][ T7488] [<ffff8000823b1d20>] __push_leaf_right+0x8f0/0xc70
[ 304.344232][ T7488] stacktrace:
[ 304.344241][ T7488] __push_leaf_right+0x8f0/0xc70
[ 304.344260][ T7488] push_leaf_right+0x408/0x628
[ 304.344278][ T7488] btrfs_del_items+0x974/0xaec
[ 304.344297][ T7488] btrfs_truncate_inode_items+0x1c5c/0x2b00
[ 304.344314][ T7488] btrfs_evict_inode+0xa4c/0xd38
[ 304.344335][ T7488] evict+0x340/0x7b0
[ 304.344352][ T7488] iput+0x4ec/0x840
[ 304.344369][ T7488] do_unlinkat+0x444/0x59c
[ 304.344388][ T7488] __arm64_sys_unlinkat+0x11c/0x260
[ 304.344407][ T7488] invoke_syscall+0x88/0x2e0
[ 304.344425][ T7488] el0_svc_common.constprop.0+0xe8/0x2e0
[ 304.344445][ T7488] do_el0_svc+0x44/0x60
[ 304.344463][ T7488] el0_svc+0x50/0x188
[ 304.344482][ T7488] el0t_64_sync_handler+0x10c/0x140
[ 304.344503][ T7488] el0t_64_sync+0x198/0x19c
Thanks.
Byungchul
I didn't see any location where __push_leaf_right() is locking a folio
nor the original do_unlinkat().
So here I can only guess the folio is from __push_leaf_right() context,
that means it can only be a metadata folio.
If no, it could lead a deadlock in my opinion. If yes, dept should
assign different classes to folios between data data and metadata.
So far I believe the folio belongs to metadata.
And since btrfs has very different handling of metadata folios, and it's
a little confusing that, we also have a btree_inode to handle the
metadata page cache, but do not have read_folio() callbacks, it can be a
little confusing to some automatic tools.
Thanks,
Qu