Re: [PATCH v2 7/8] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list

From: Ojaswin Mujoo
Date: Mon Jan 16 2023 - 03:16:46 EST


On Tue, Nov 29, 2022 at 08:07:50PM +0530, Ojaswin Mujoo wrote:
> On Mon, Nov 28, 2022 at 10:06:34PM -0500, Theodore Ts'o wrote:
> > Looking at the stack trace it looks like we're hitting this BUG_ON:
> >
> > spin_lock(&tmp_pa->pa_lock);
> > if (tmp_pa->pa_deleted == 0)
> > BUG_ON(!(start >= tmp_pa_end || end <= tmp_pa_start));
> > spin_unlock(&tmp_pa->pa_lock);
> >
> > ... in the inline function ext4_mb_pa_assert_overlap(), called from
> > ext4_mb_pa_adjust_overlap().
> >
> > The generic/269 test runs fstress with an ENOSPC hitter as an
> > antogonist process. The ext3 configuration disables delayed
> > allocation, which means that fstress is going to be allocating blocks
> > at write time (instead of dirty page writeback time).
> >
> > Could you take a look? Thanks!
> Hi Ted,
>
> Thanks for pointing this out, I'll have a look into this.
>
> PS: I'm on vacation so might be a bit slow to update on this.
>
> Regards,
> Ojaswin
> >
> > - Ted

Hi Ted,

Apologies for the delay on this due to new years/replication issues. I
was not able to replicate it at my end for some time before ultimately
replicating this in gce-xfstests. I have sent a v3 [1] that fixes the
issue by introducing an optimization related to delted PAs found in
rbtree traversal in functions like ext4_mb_adjust_overlap() and
ext4_mb_use_preallocated(). Let me quote the commit message which
explains the issue and the fix we proposed:

---- snip ----

In ext4_mb_adjust_overlap(), it is possible for an allocating thread to
come across a PA that is marked deleted via
ext4_mb_discard_group_preallocations() but not yet removed from the
inode PA rbtree. In such a case, we ignore any overlaps with this PA
node and simply move forward in the rbtree by comparing logical start of
to-be-inserted PA and the deleted PA node.

Although this usually doesn't cause an issue and we can move forward in
the tree, in one special case, i.e if range of deleted PA lies
completely inside the normalized range, we might require to travese both
the sub-trees under such a deleted PA.

To simplify this special scenario and also as an optimization, undelete
the PA If the allocating thread determines that this PA might be needed
in the near future. This results in the following changes:

- ext4_mb_use_preallocated(): Use a deleted PA if original request lies in it.
- ext4_mb_pa_adjust_overlap(): Undelete a PA if it is deleted but there
is an overlap with the normalized range.
- ext4_mb_discard_group_preallocations(): Rollback delete operation if
allocation path undeletes a PA before it is erased from inode PA
list.

Since this covers the special case we discussed above, we will always
un-delete the PA when we encounter the special case and we can then
adjust for overlap and traverse the PA rbtree without any issues.

---- snip ----

The above optimization is now straight forward as we completely lock the
rbtree during traversals and modifications. Earlier in case of list,
the locking would have been tricky due to usage of RCU.

[1]
https://lore.kernel.org/linux-ext4/20230116080216.249195-1-ojaswin@xxxxxxxxxxxxx/