Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]

From: Josh Boyer
Date: Wed Sep 16 2015 - 13:57:07 EST


On Tue, Sep 15, 2015 at 8:14 AM, Josh Boyer <jwboyer@xxxxxxxxxxxxxxxxx> wrote:
> On Sat, Sep 12, 2015 at 9:19 AM, Ming Lei <ming.lei@xxxxxxxxxxxxx> wrote:
>> On Fri, 11 Sep 2015 17:43:15 -0400
>> Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
>>
>>> Ming, Jens, others:
>>>
>>> Please see this BZ comment that speaks to a 4.3 regression due to the
>>> late bio splitting changes:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41
>>
>> I think it is a bug of bounce_end_io, and the following patch may
>> fix it.
>>
>> ----
>> From 08df0db0be41e6bea306bcf5b4d325f5a79dc7a1 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <ming.lei@xxxxxxxxxxxxx>
>> Date: Sat, 12 Sep 2015 20:48:42 +0800
>> Subject: [PATCH] block: fix bounce_end_io
>>
>> When bio bounce is involved, one new bio and its io vector are
>> cloned from the coming bio, which can be one fast-cloned bio
>> and its io vector can be shared with another bio too, especially
>> after bio_split() is introduced.
>>
>> So it is obviously wrong to assume the start index of the original
>> bio's io vector is zero, which can be any value between 0 and
>> (bi_max_vecs - 1), especially in case of bio split.
>>
>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
>> ---
>> block/bounce.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/bounce.c b/block/bounce.c
>> index 0611aea..1cb5dd3 100644
>> --- a/block/bounce.c
>> +++ b/block/bounce.c
>> @@ -128,12 +128,14 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool)
>> struct bio *bio_orig = bio->bi_private;
>> struct bio_vec *bvec, *org_vec;
>> int i;
>> + int start = bio_orig->bi_iter.bi_idx;
>>
>> /*
>> * free up bounce indirect pages used
>> */
>> bio_for_each_segment_all(bvec, bio, i) {
>> - org_vec = bio_orig->bi_io_vec + i;
>> + org_vec = bio_orig->bi_io_vec + i + start;
>> +
>> if (bvec->bv_page == org_vec->bv_page)
>> continue;
>>
>> --
>> 1.9.1
>>
>>> But inlined here so we can continue on list:
>>> (In reply to Josh Boyer from comment #40)
>>> > The function that was fixed in 4.2 doesn't exist any longer in
>>> > 4.3.0-0.rc0.git6.1.fc24. That kernel corresponds to Linux
>>> > v4.2-6105-gdd5cdb48edfd which contains commit
>>> > 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely. So
>>> > whatever fix was made in dm_merge_bvec doesn't seem to have made it to
>>> > whatever replaced it.
>>>
>>> The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
>>> dm_merge_bvec regression on 32 bit systems"). But I'm not sure there is
>>> a clear equivalent in the late bio splitting code that replaced block
>>> core's merge_bvec logic.
>>>
>>> merge_bvec was all about limiting bios (by asking "can/should this page
>>> be added to this bio?") whereas the late bio splitting is more "build
>>> the bios as large as possible and worry about splitting later".
>>
>> IMO, given one vector can only point to one page, there shouldn't
>> have difference between the two.
>>
>>>
>>> Regardless, this regression needs to be reported to Ming Lin
>>> <ming.l@xxxxxxxxxxxxxxx>, Jens Axboe and the others involved in
>>> maintaining the late bio splitting changes in block core.
>>>
>>> Josh and/or Adam: it would _really_ help if the regression test you guys
>>> are using could be handed-over and/or explained to us. Is it as simple
>>> as loading a 32bit with a particular config? Can you share the guest
>>> image if it is small enough?
>>
>> Josh, Adam, would you mind testing the above patch to see if it can fix
>> your issue?
>
> Sorry for the delay in reply. I'll try and work with Adam today to
> get this tested.

FWIW, reproducing the environment to recreate this is rather difficult
at the moment for reasons unrelated to the kernel. We're going to add
the patch to our rawhide kernel so it gets pulled into tomorrow's
compose. We'll test it as soon as it is available and let you know.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/