Re: [PATCH] block: copy bi_vcnt in __bio_clone_fast

From: Ming Lei
Date: Thu Oct 09 2014 - 12:13:56 EST


On Thu, Oct 9, 2014 at 11:25 PM, Ming Lei <ming.lei@xxxxxxxxxxxxx> wrote:
> On Thu, Oct 9, 2014 at 10:26 PM, Jeff Mahoney <jeffm@xxxxxxxx> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 10/9/14, 9:53 AM, Jeff Moyer wrote:
>>> Jeff Mahoney <jeffm@xxxxxxxx> writes:
>>>
>>>> Commit 05f1dd53152173 (block: add queue flag for disabling SG
>>>> merging) uses bi_vcnt to assign bio->bi_phys_segments if sg
>>>> merging is disabled. When using device mapper on top of a blk-mq
>>>> device (virtio_blk in my test), we'd end up overflowing the
>>>> scatterlist in __blk_bios_map_sg.
>>>>
>>>> __bio_clone_fast copies bi_iter and bi_io_vec but not bi_vcnt,
>>>> so blk_recount_segments would report bi_phys_segments as 0.
>>>> Since rq->nr_phys_segments is 0 as well, the checks to ensure
>>>> that we don't exceed the queue's segment limit end up allowing
>>>> more bios (and segments) to attach the a request until we finally
>>>> map it. That also means we pass the BUG_ON at the beginning of
>>>> virtio_queue_rq, ultimately causing memory corruption and a
>>>> crash.
>>>>
>>>> If we copy bi_vcnt in __bio_clone_fast, the bios and requests
>>>> properly report the number of segments and everything works as
>>>> expected.
>>>>
>>>> Originally reported at
>>>> http://bugzilla.opensuse.org/show_bug.cgi?id=888259
>>>
>>> Hi, Jeff,
>>>
>>> Did you manage to reproduce this problem with commit 0738854
>>> (blk-merge: fix blk_recount_segments) applied? Or perhaps with
>>> commit 200612e (dm table: propagate QUEUE_FLAG_NO_SG_MERGE)?
>>
>> Yep. I was able to reproduce it with 3.17. I did try 0738854 when I
>> was still using 3.16 since it looked like a good candidate. Neither of
>> those patches affect the problem here. bio->bi_phys_segments never
>> gets a value set in the fast clone case and that translates to
>> req->nr_phys_segments never getting properly accumulated. That might
>> not be a problem except that the NO_SG_MERGE behavior bypasses the
>> iteration that would come up with the correct value. In either case,
>
> This patch may get incorrect rq->nr_phys_segments because
> bio cloning is often used in case of I/O splitting, so could you
> test if the attached patch fixes your problem?

Please ignore last patch and test the attached patch since we
still should use no sg merge to recalculate req's segments for
cloned bio.

Thanks,
From fd8be35c1b1e35781137b2020dd29552c3f8eb98 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@xxxxxxxxxxxxx>
Date: Thu, 9 Oct 2014 23:17:35 +0800
Subject: [PATCH] blk-merge: don't compute bi_phys_segments from bi_vcnt for
cloned bio

It isn't correct to figure out req->bi_phys_segments from bio->bi_vcnt
if the bio is cloned.

Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
---
block/blk-merge.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index f71bad3..ba99351 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -97,14 +97,18 @@ void blk_recalc_rq_segments(struct request *rq)

void blk_recount_segments(struct request_queue *q, struct bio *bio)
{
- if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags) &&
+ bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE,
+ &q->queue_flags);
+
+ if (no_sg_merge && !bio_flagged(bio, BIO_CLONED) &&
bio->bi_vcnt < queue_max_segments(q))
bio->bi_phys_segments = bio->bi_vcnt;
else {
struct bio *nxt = bio->bi_next;

bio->bi_next = NULL;
- bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, false);
+ bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio,
+ no_sg_merge);
bio->bi_next = nxt;
}

--
1.7.9.5