Re: [PATCH] btrfs: index buffer_tree using node size
From: David Sterba
Date: Fri Jun 06 2025 - 13:28:44 EST
On Mon, May 12, 2025 at 07:23:20PM +0200, Daniel Vacek wrote:
> So far we are deriving the buffer tree index using the sector size. But each
> extent buffer covers multiple sectors. This makes the buffer tree rather sparse.
>
> For example the typical and quite common configuration uses sector size of 4KiB
> and node size of 16KiB. In this case it means the buffer tree is using up to
> the maximum of 25% of it's slots. Or in other words at least 75% of the tree
> slots are wasted as never used.
>
> We can score significant memory savings on the required tree nodes by indexing
> the tree using the node size instead. As a result far less slots are wasted
> and the tree can now use up to all 100% of it's slots this way.
>
> Signed-off-by: Daniel Vacek <neelx@xxxxxxxx>
This is a useful improvement, so we should continue and merge it. The
performance improvements should be done so we get some idea. Runtime and
slab savings.
One coding comment, please rename node_bits to nodesize_bits so it's
consistent with sectorsize and sectorsize_bits.
> fs/btrfs/disk-io.c | 1 +
> fs/btrfs/extent_io.c | 30 +++++++++++++++---------------
> fs/btrfs/fs.h | 3 ++-
> 3 files changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 5bcf11246ba66..dcea5b0a2db50 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4294,9 +4294,9 @@ static int try_release_subpage_extent_buffer(struct folio *folio)
> {
> struct btrfs_fs_info *fs_info = folio_to_fs_info(folio);
> struct extent_buffer *eb;
> - unsigned long start = folio_pos(folio) >> fs_info->sectorsize_bits;
> + unsigned long start = folio_pos(folio) >> fs_info->node_bits;
> unsigned long index = start;
> - unsigned long end = index + (PAGE_SIZE >> fs_info->sectorsize_bits) - 1;
> + unsigned long end = index + (PAGE_SIZE >> fs_info->node_bits) - 1;
This looks a bit suspicious, page size is 4k node size can be 4k .. 64k.
It's in subpage code so sector < page, the shift it's always >= 0. Node
can be larger so the shift result would be 0 but as a result of 4k
shifted by "16k".
> int ret;
>
> xa_lock_irq(&fs_info->buffer_tree);
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index cf805b4032af3..8c9113304fabe 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -778,8 +778,9 @@ struct btrfs_fs_info {
>
> struct btrfs_delayed_root *delayed_root;
>
> - /* Entries are eb->start / sectorsize */
> + /* Entries are eb->start >> node_bits */
> struct xarray buffer_tree;
> + int node_bits;
u32 and pleas move it to nodesize.