Re: [Ext2-devel] [RFC] [PATCH] Reducing average ext2 fsck time through fs-wide dirty bit]

From: Andreas Dilger
Date: Fri Mar 24 2006 - 16:20:23 EST


On Mar 24, 2006 13:52 -0700, Matthew Wilcox wrote:
> On Fri, Mar 24, 2006 at 12:28:02PM -0700, Andreas Dilger wrote:
> > Fix for this problem (inode is locked already):
> > - create a modified ext3_free_branches() to do tree walking and call a
> > method instead of always calling ext3_free_data->ext3_clear_blocks
> > - walk inode {d,t,}indirect blocks in forward direction, count bitmaps and
> > groups that will be modified (essentially NULL ext3_free_branches method)
> > - try to start a journal handle for this many blocks + 1 (inode) +
> > 1 (super) + quota + EXT3_RESERVE_TRANS_BLOCKS
> > - if journal handle is too large (journal_start() returns -ENOSPC) fall
> > back to old zero-in-steps method (vast majority of cases will be OK
> > because number of modified blocks is much fewer)
>
> Could we try a different fallback in this case? For example, attempt to
> truncate only half as much? Is this even allowed?

What you suggest IS essentially the fallback. The current code will start
truncating at the end and grow the truncation until it can't any longer.
In order to make this operation correct w.r.t. recovery, it HAS to
zero out the already-truncated blocks, because the first transaction
may complete and commit, while the second may not. The proposed new
behaviour is only acceptable because it ensures that the whole truncate
can be completed in a single transaction.


For a rough estimate of the allowable size of a "new" truncate
transaction, worst case truncate dirties every group in the filesystem.
A 2TB filesystem has 16384 groups, maximum transaction size:

(16384 bitmaps + (16384 / 128) group desc + inode + super + quota)
= 16518

requiring a journal size of 4x that is about 260MB (default journal
size is 128MB these days for large filesystems). For the worst case 1
block/group this works out to a 64MB file, but in the vast majority of
cases we will have more than a single block per group, and could have
a full file truncate (up to 2TB file size) in the same (or smaller)
transaction size. Best case is about 125MB/group (i.e. per 4kB of journal
transaction size).

With the absolute minimum journal size we could always truncate files
up to 1MB w/o fallback, and rougly up to 16GB (at 1/2 group chunks per
"extent") without fallback.

The current code needs ~33 4kB blocks per 128MB of file size.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/