Re: [RFC] DEPT report on around btrfs, unlink, and truncate

From: Qu Wenruo
Date: Mon Jun 23 2025 - 06:00:37 EST




在 2025/6/23 19:22, Byungchul Park 写道:
On Mon, Jun 23, 2025 at 06:22:44PM +0930, Qu Wenruo wrote:
在 2025/6/23 17:49, Byungchul Park 写道:
On Mon, Jun 23, 2025 at 03:20:43PM +0930, Qu Wenruo wrote:
在 2025/6/23 12:51, Byungchul Park 写道:
Hi folks,

Thanks to Yunseong, we got two DEPT reports in btrfs. It doesn't mean
it's obvious deadlocks, but after digging into the reports, I'm
wondering if it could happen by any chance.

1) The first scenario that I'm concerning is:

context A context B

do_truncate()
...
btrfs_do_readpage() // with folio lock held

This one is for data.

Do you mean this folio is for data? Thanks for the confirmation.

Yes, only data folios will go through btrfs_do_readpage().

For metadata, we never go through btrfs_do_readpage(), but
read_extent_buffer_pages_nowait().



do_unlinkat()
...
push_leaf_right()
btrfs_tree_lock_nested()
down_write_nested(&eb->lock) // hold

This is struct extent_buffer's rw_sem. Right?

btrfs_get_extent()
btrfs_lookup_file_extent()
btrfs_search_slot()
down_read_nested(&eb->lock) // stuck

This one is for metadata.
^
I don't get this actually.

This is struct extent_buffer's rw_sem, too. Cannot this rw_sem be the
same as the rw_sem above in context A?

My bad, I thought you're talking about that down_read_nested()
conflicting with folio lock.

But if you're talking about extent_buffer::lock, then the one in context
B will wait for the one in context A, and that's expected.

Sounds good.

Data and metadata page cache will never cross into each other.

Thanks,
Qu

__push_leaf_right()
...
folio_lock() // stuck

Did you mean this folio is always for metadata?

Can you explain more on where this folio_lock() comes from?

I should also rely on the following stacktrace in the dept report. I
asked Yunseong who reported this issue, for the decoded stacktrace, so
that I can interpret that better. I will get back once I figure out
where the wait on PG_locked comes from.

[ 304.344198][ T7488] [W] dept_page_wait_on_bit(pg_locked_map:0):
[ 304.344211][ T7488] [<ffff8000823b1d20>] __push_leaf_right+0x8f0/0xc70

I believe it's from btrfs_clear_buffer_dirty():

As we have a for() loop iterating all the folios of a an extent buffer (aka, metadata structure), then clear the dirty flags.

The same applies to btrfs_mark_buffer_dirty() -> set_extent_buffer_dirty().

In that case, the folio is 100% belonging to btree inode thus metadata.

Thus the folio lock can not conflict with a data folio, thus there should be no deadlock.

Thanks,
Qu


[ 304.344232][ T7488] stacktrace:
[ 304.344241][ T7488] __push_leaf_right+0x8f0/0xc70
[ 304.344260][ T7488] push_leaf_right+0x408/0x628
[ 304.344278][ T7488] btrfs_del_items+0x974/0xaec
[ 304.344297][ T7488] btrfs_truncate_inode_items+0x1c5c/0x2b00
[ 304.344314][ T7488] btrfs_evict_inode+0xa4c/0xd38
[ 304.344335][ T7488] evict+0x340/0x7b0
[ 304.344352][ T7488] iput+0x4ec/0x840
[ 304.344369][ T7488] do_unlinkat+0x444/0x59c
[ 304.344388][ T7488] __arm64_sys_unlinkat+0x11c/0x260
[ 304.344407][ T7488] invoke_syscall+0x88/0x2e0
[ 304.344425][ T7488] el0_svc_common.constprop.0+0xe8/0x2e0
[ 304.344445][ T7488] do_el0_svc+0x44/0x60
[ 304.344463][ T7488] el0_svc+0x50/0x188
[ 304.344482][ T7488] el0t_64_sync_handler+0x10c/0x140
[ 304.344503][ T7488] el0t_64_sync+0x198/0x19c

Thanks.

Byungchul

I didn't see any location where __push_leaf_right() is locking a folio
nor the original do_unlinkat().

So here I can only guess the folio is from __push_leaf_right() context,
that means it can only be a metadata folio.


If no, it could lead a deadlock in my opinion. If yes, dept should
assign different classes to folios between data data and metadata.

So far I believe the folio belongs to metadata.

And since btrfs has very different handling of metadata folios, and it's
a little confusing that, we also have a btree_inode to handle the
metadata page cache, but do not have read_folio() callbacks, it can be a
little confusing to some automatic tools.

Thanks,
Qu