Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap

From: Yu Kuai
Date: Tue Jul 29 2025 - 22:04:28 EST


Hi,

在 2025/07/29 18:16, Jan Kara 写道:
On Tue 29-07-25 11:19:05, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

Currently elevators will record internal 'async_depth' to throttle
asynchronous requests, and they both calculate shallow_dpeth based on
sb->shift, with the respect that sb->shift is the available tags in one
word.

However, sb->shift is not the availbale tags in the last word, see
__map_depth:

if (index == sb->map_nr - 1)
return sb->depth - (index << sb->shift);

For consequence, if the last word is used, more tags can be get than
expected, for example, assume nr_requests=256 and there are four words,
in the worst case if user set nr_requests=32, then the first word is
the last word, and still use bits per word, which is 64, to calculate
async_depth is wrong.

One the other hand, due to cgroup qos, bfq can allow only one request
to be allocated, and set shallow_dpeth=1 will still allow the number
of words request to be allocated.

Fix this problems by using shallow_depth to the whole sbitmap instead
of per word, also change kyber, mq-deadline and bfq to follow this.

Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>

I agree with these problems but AFAIU this implementation of shallow depth
has been done for a reason. Omar can chime in here as the original author
or perhaps Jens but the idea of current shallow depth implementation is
that each sbitmap user regardless of used shallow depth has a chance to
allocate from each sbitmap word which evenly distributes pressure among
available sbitmap words. With the implementation you've chosen there will
be higher pressure (and thus contention) on words with low indices.

Yes, this make sense. However, consider that shallow depth is only used
by elevator, this higher pressure should be negligible for deadline and
bfq. As for kyber, this might be a problem.

So I think we would be good to fix issues with shallow depth for small
number of sbitmap words (because that's where these buggy cornercases may
matter in practice) but I believe the logic which constrains number of used
bits from each *word* when shallow_depth is specified should be kept. It
might make sense to change the API so that shallow_depth is indeed
specified compared to the total size of the bitmap, not to the size of the
word (because that's confusing practically everybody I've met and is a
constant source of bugs) if it can be made to perform well.

Do you think will it be ok to add a new shallow depth API to use the
total size, and convert bfq and deadline to use it?

Thanks,
Kuai


Honza