Re: [PATCH 2/4] Drivers: hv: balloon: account for gaps in hot add regions

From: Vitaly Kuznetsov
Date: Mon Aug 08 2016 - 05:38:19 EST


"Alex Ng (LIS)" <alexng@xxxxxxxxxxxxx> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
>> Sent: Friday, August 5, 2016 3:49 AM
>> To: devel@xxxxxxxxxxxxxxxxxxxxxx
>> Cc: linux-kernel@xxxxxxxxxxxxxxx; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>;
>> KY Srinivasan <kys@xxxxxxxxxxxxx>; Alex Ng (LIS) <alexng@xxxxxxxxxxxxx>
>> Subject: [PATCH 2/4] Drivers: hv: balloon: account for gaps in hot add regions
>>
>> I'm observing the following hot add requests from the WS2012 host:
>>
>> hot_add_req: start_pfn = 0x108200 count = 330752
>> hot_add_req: start_pfn = 0x158e00 count = 193536
>> hot_add_req: start_pfn = 0x188400 count = 239616
>>
>> As the host doesn't specify hot add regions we're trying to create 128Mb-
>> aligned region covering the first request, we create the 0x108000 -
>> 0x160000 region and we add 0x108000 - 0x158e00 memory. The second
>> request passes the pfn_covered() check, we enlarge the region to 0x108000 -
>> 0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with
>> the third request as it starts at 0x188400 so there is a 0x200 gap which is not
>> covered. As the end of our region is 0x190000 now it again passes the
>> pfn_covered() check were we just adjust the covered_end_pfn and make it
>> 0x188400 instead of 0x188200 which means that we'll try to online
>> 0x188200-0x188400 pages but these pages were never assigned to us and we
>> crash.
>
> The fact that the host sent a request that's non-contiguous with the previous
> request is unexpected. Could we check to see the number of pages we returned
> in our response, after each request?
>
> I'm wondering if we may have given a wrong response to cause the host to
> follow-up with a gapped request.

It seems it is not the case, here is the recorded session (address
format is hex, count is decimal):

[ 66.851401] DM: hot_add_req: 108200 303104 0 0

-> we were asked to add 303104 pages ...

[ 66.854420] DM: handle_pg_range: 108200 303104
[ 84.489291] DM: handle_pg_range: return 303104
[ 84.492498] DM: hot_add_req: ret 303104

-> and we returned '303104'

[ 131.934542] DM: hot_add_req: 152200 221184 0 0

-> we were asked to add 221184 pages ...

[ 131.937495] DM: handle_pg_range: 152200 221184
[ 132.720390] DM: handle_pg_range: return 221184
[ 132.722953] DM: hot_add_req: ret 221184

-> and we returned '221184'

[ 132.958045] DM: hot_add_req: 188400 409088 0 0

-> and here we were asked to add pages with a gap (0x108200 + 303104 +
221184 = 0x188200 but as you can see the new range starts at 0x188400)

[ 132.961409] DM: handle_pg_range: 188400 409088
[ 134.012555] DM: handle_pg_range: return 409088
[ 134.013862] DM: hot_add_req: ret 409088

so I don't see a flaw on Linux side ...

--
Vitaly