Re: ext3 allocate-with-reservation latencies

From: Stephen C. Tweedie
Date: Wed Apr 06 2005 - 04:53:50 EST


Hi,

On Wed, 2005-04-06 at 06:35, Mingming Cao wrote:

> It seems we are holding the rsv_block while searching the bitmap for a
> free bit.

Probably something to avoid!

> In alloc_new_reservation(), we first find a available to
> create a reservation window, then we check the bitmap to see if it
> contains any free block. If not, we will search for next available
> window, so on and on. During the whole process we are holding the global
> rsv_lock. We could, and probably should, avoid that. Just unlock the
> rsv_lock before the bitmap search and re-grab it after it. We need to
> make sure that the available space that are still available after we re-
> grab the lock.

Not necessarily. As long as the windows remain mutually exclusive in
the rbtree, it doesn't matter too much if we occasionally allocate a
full one --- as long as that case is rare, the worst that happens is
that we fail to allocate from the window and have to repeat the window
reserve.

The difficulty will be in keeping it rare. What we need to avoid is the
case where multiple tasks need a new window, they all drop the lock,
find the same bits free in the bitmap, then all try to take that
window. One will succeed, the others will fail; but as the files in
that bit of the disk continue to grow, we risk those processes
*continually* repeating the same stomp-on-each-others'-windows operation
and raising the allocation overhead significantly.

> Another option is to hold that available window before we release the
> rsv_lock, and if there is no free bit inside that window, we will remove
> it from the tree in the next round of searching for next available
> window.

Possible, but not necessarily nice. If you've got a nearly-full disk,
most bits will be already allocated. As you scan the bitmaps, it may
take quite a while to find a free bit; do you really want to (a) lock
the whole block group with a temporary window just to do the scan, or
(b) keep allocating multiple smaller windows until you finally find a
free bit? The former is bad for concurrency if you have multiple tasks
trying to allocate nearby on disk --- you'll force them into different
block groups. The latter is high overhead.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/