Re: [PATCH 1/5] ext4: restart handle if credits are insufficient during writepages
From: Zhang Yi
Date: Fri Jun 06 2025 - 23:54:40 EST
On 2025/6/6 21:16, Jan Kara wrote:
> On Fri 06-06-25 14:54:21, Zhang Yi wrote:
>> On 2025/6/5 22:04, Jan Kara wrote:
>>>> + /*
>>>> + * The credits for the current handle and transaction have
>>>> + * reached their upper limit, stop the handle and initiate a
>>>> + * new transaction. Note that some blocks in this folio may
>>>> + * have been allocated, and these allocated extents are
>>>> + * submitted through the current transaction, but the folio
>>>> + * itself is not submitted. To prevent stale data and
>>>> + * potential deadlock in ordered mode, only the
>>>> + * dioread_nolock mode supports this.
>>>> + */
>>>> + if (err > 0) {
>>>> + WARN_ON_ONCE(!ext4_should_dioread_nolock(inode));
>>>> + mpd->continue_map = 1;
>>>> + err = 0;
>>>> + goto update_disksize;
>>>> + }
>>>> } while (map->m_len);
>>>>
>>>> update_disksize:
>>>> @@ -2467,6 +2501,9 @@ static int mpage_map_and_submit_extent(handle_t *handle,
>>>> if (!err)
>>>> err = err2;
>>>> }
>>>> + if (!err && mpd->continue_map)
>>>> + ext4_get_io_end(io_end);
>>>> +
>>>
>>> IMHO it would be more logical to not call ext4_put_io_end[_deferred]() in
>>> ext4_do_writepages() if we see we need to continue doing mapping for the
>>> current io_end.
>>>
>>> That way it would be also more obvious that you've just reintroduced
>>> deadlock fixed by 646caa9c8e196 ("ext4: fix deadlock during page
>>> writeback"). This is actually a fundamental thing because for
>>> ext4_journal_stop() to complete, we may need IO on the folio to finish
>>> which means we need io_end to be processed. Even if we avoided the awkward
>>> case with sync handle described in 646caa9c8e196, to be able to start a new
>>> handle we may need to complete a previous transaction commit to be able to
>>> make space in the journal.
>>
>> Yeah, you are right, I missed the full folios that were attached to the
>> same io_end in the previous rounds. If we continue to use this solution,
>> I think we should split the io_end and submit the previous one which
>> includes those full folios before the previous transaction is
>> committed.
>
> Yes, fully mapped folios definitely need to be submitted. But I think that
> should be handled by ext4_io_submit() call in ext4_do_writepages() loop?
Sorry, my previous description may not have been clear enough. The
deadlock issue in this solution should be:
1. In the latest round of ext4_do_writepages(),
mpage_prepare_extent_to_map() may add some contiguous fully mapped
folios and an unmapped folio(A) to the current processing io_end.
2. mpage_map_and_submit_extent() mapped some bhs in folio A and bail out
due to the insufficient journal credits, it acquires one more
refcount of io_end to prevent ext4_put_io_end() convert the extent
written status of the newly allcoated extents, since we don't submit
this partial folio now.
3. ext4_io_submit() in ext4_do_writepages() submits the fully mapped
folios, but the endio process cannot be invoked properly since the
above extra refcount, so the writeback state of the folio cannot be
cleared.
4. Finally, it stops the current handle and waits for the transaction to
be committed. However, the commit process also waits for those fully
mapped folios to complete, which can lead to a deadlock.
Therefore, if the io_end contains both fully mapped folios and a partial
folio, we need to split the io_end, the first one contains those mapped
folios, and the second one only contain the partial folio. We only hold
the refcount of the second io_end, the first one can be finished properly,
this can break the deadlock.
However, the solution of submitting the partial folio sounds more simple
to me.
[..]
>>> Then once IO completes
>>> mpage_prepare_extent_to_map() is able to start working on the folio again.
>>> Since we cleared dirty bits in the buffers we should not be repeating the
>>> work we already did...
>>>
>>
>> Hmm, it looks like this solution should work. We should introduce a
>> partial folio version of mpage_submit_folio(), call it and redirty
>> the folio once we need to bail out of the loop since insufficient
>> space or journal credits. But ext4_bio_write_folio() will handle the
>> the logic of fscrypt case, I'm not familiar with fscrypt, so I'm not
>> sure it could handle the partial page properly. I'll give it a try.
>
> As far as I can tell it should work fine. The logic in
> ext4_bio_write_folio() is already prepared for handling partial folio
> writeouts, redirtying of the page etc. (because it needs to handle writeout
> from transaction commit where we can writeout only parts of folios with
> underlying blocks allocated). We just need to teach mpage_submit_folio() to
> substract only written-out number of pages from nr_to_write.
>
Yes, indeed. I will try this solution.
Thanks,
Yi.