Re: huge fsync latencies for a small file on ext4

From: Jan Kara
Date: Tue Feb 26 2019 - 03:30:13 EST


On Mon 25-02-19 10:40:07, Sahitya Tummala wrote:
> On Tue, Feb 19, 2019 at 02:53:02PM +0100, Jan Kara wrote:
> > One has to be really careful when using i_size like this. By the time the
> > transaction is committing, i_size could have been reduced from the value at
> > the time page writeback was issued. And that change will be journalled only
> > in the following transaction. So if the system crashes in the wrong moment,
> > user could see uninitialized blocks between new_size and old_size after
> > journal replay. So I don't think your patch is really correct.
> >
>
> Thanks Jan for the clarification on the patch. I agree with your comments.
>
> From that discussion, I think the problem that it is discussing is w.r.t
> journal thread waiting for on-going active transaction updates to be done
> and thus causing commit latencies.

Yes.

> And I think the proposal is to do not
> hold any handle while extents are being mapped in ext4_map_blocks() but
> defer it till IO is completely done.

Yes, real block allocation and insertion in extent tree will happen after
IO completion.

> And with the new proposal since the inode will be added to
> transaction->t_inode_list only after the IO is completed, there will be
> no longer the need to do journal_finish_inode_data_buffers() in the journal
> context and thus this problem also will not be observed? Is my understanding
> correct, please clarify.

Actually, with the new proposal, we can just completely stop adding inodes
to transaction->t_inode_list. But otherwise you're right.

Honza

>
> > Ted has outlined a plan how to get rid of data=ordered limitations [1] and
> > thus also this problem. It is quite some work but you're certainly welcome
> > to help out :)
> >
> > [1] https://www.spinics.net/lists/linux-ext4/msg64175.html
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR