Re: [PATCH] RFC: loop: Avoid calling blk_mq_freeze_queue() when possible.

From: Bart Van Assche
Date: Wed Aug 14 2019 - 11:29:56 EST


On 8/14/19 3:32 AM, Martijn Coenen wrote:
Since Android Q, the creation and configuration of loop devices is in
the critical path of device boot. We found that the configuration of
loop devices is pretty slow, because many ioctl()'s involve freezing the
block queue, which in turn needs to wait for an RCU grace period. On
Android devices we've observed up to 60ms for the creation and
configuration of a single loop device; as we anticipate creating many
more in the future, we'd like to avoid this delay.

This allows LOOP_SET_BLOCK_SIZE to be called before the loop device has
been bound; since the block queue is not running at that point, we can
avoid the expensive freezing of the queue.

On a recent x86, this patch yields the following results:

===
Call LOOP_SET_BLOCK_SIZE on /dev/loop0 before being bound
===
~# time ./set_block_size

real 0m0.002s
user 0m0.000s
sys 0m0.002s

===
Call LOOP_SET_BLOCK_SIZE on /dev/loop0 after being bound
===
~# losetup /dev/loop0 fs.img
~# time ./set_block_size

real 0m0.008s
user 0m0.000s
sys 0m0.002s

Over many runs, this is a 4x improvement.

This is RFC because technically it is a change in behavior; before,
calling LOOP_SET_BLOCK_SIZE on an unbound device would return ENXIO, and
userspace programs that left it in their code despite the returned
error, would now suddenly see the requested value effectuated. I'm not
sure whether this is acceptable.

An alternative might be a CONFIG option to set the default block size to
another value than 512. Another alternative I considered is allowing the
block device to be created with a "frozen" queue, where we can manually
unfreeze the queue when all the configuration is done. This would be a
much larger code change, though.

Hi Martijn,

Is the loop driver used in Android Q to make a file on a filesystem visible as a block device or rather to make a subset of a block device visible as a block device? In the latter case, have you considered to use the dm-linear driver instead? I expect that the overhead per I/O of dm-linear will be lower than that of the loop driver.

Bart.