Re: Bug report about KASLR and ZONE_MOVABLE

From: Chao Fan
Date: Thu Jul 12 2018 - 21:46:14 EST


On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote:
>Hi Michal,
>
>On 07/12/18 at 02:32pm, Michal Hocko wrote:
>> On Thu 12-07-18 14:01:15, Chao Fan wrote:
>> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>> > >Hi Baoquan,
>> > >
>> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
>> > >> Please try this v3 patch:
>> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> > >> From: Baoquan He <bhe@xxxxxxxxxx>
>> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> > >>
>> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> > >> PFN movable zone begins in each node, kernel text position is not
>> > >> considered. KASLR may put kernel after which movable zone begins.
>> > >>
>> > >> Fix it by finding movable zone after kernel text on that node.
>> > >>
>> > >> Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
>> > >
>> > >
>> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
>> > >last node, or more.
>> >
>> > I think it may not fail.
>> > There is a 'restart' to do another pass.
>> >
>> > >
>> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
>> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>> > >this in the KASLR side.
>> > >
>> >
>> > I think it's better to fix here, but not KASLR side.
>> > Cause much more code will be change if doing it in KASLR side.
>> > Since we didn't parse 'kernelcore' in compressed code, and you can see
>> > the distribution of ZONE_MOVABLE need so much code, so we do not need
>> > to do so much job in KASLR side. But here, several lines will be OK.
>>
>> I am not able to find the beginning of the email thread right now. Could
>> you summarize what is the actual problem please?
>
>The bug is found on x86 now.
>
>When added "kernelcore=" or "movablecore=" into kernel command line,
>kernel memory is spread evenly among nodes. However, this is right when
>KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
>If KASLR enabled, it could be put any place from 16M to 64T randomly.
>
>Consider a scenario, we have 10 nodes, and each node has 20G memory, and
>we specify "kernelcore=50%", means each node will take 10G for
>kernelcore, 10G for movable area. But this doesn't take kernel position
>into consideration. E.g if kernel is put at 15G of 2nd node, namely
>node1. Then we think on node1 there's 10G for kernelcore, 10G for
>movable, in fact there's only 5G available for movable, just after
>kernel.
>
>I made a v4 patch which possibly can fix it.
>
>
>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
>From: Baoquan He <bhe@xxxxxxxxxx>
>Date: Fri, 13 Jul 2018 07:49:29 +0800
>Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text
>
>In find_zone_movable_pfns_for_nodes(), when try to find the starting
>PFN movable zone begins at in each node, kernel text position is not
>considered. KASLR may put kernel after which movable zone begins.
>
>Fix it by finding movable zone after kernel text on that node.
>
>Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>

You can post it as alone PATCH, then I will test it next week.

Thanks,
Chao Fan

>---
> mm/page_alloc.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 1521100f1e63..5bc1a47dafda 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -6547,7 +6547,7 @@ static unsigned long __init early_calculate_totalpages(void)
> static void __init find_zone_movable_pfns_for_nodes(void)
> {
> int i, nid;
>- unsigned long usable_startpfn;
>+ unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
> unsigned long kernelcore_node, kernelcore_remaining;
> /* save the state before borrow the nodemask */
> nodemask_t saved_node_state = node_states[N_MEMORY];
>@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
> if (!required_kernelcore || required_kernelcore >= totalpages)
> goto out;
>
>+ kernel_endpfn = PFN_UP(__pa_symbol(_end));
> /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>- usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>+ arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>
> restart:
> /* Spread kernelcore memory as evenly as possible throughout nodes */
>@@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void)
> unsigned long start_pfn, end_pfn;
>
> /*
>+ * KASLR may put kernel near tail of node memory,
>+ * start after kernel on that node to find PFN
>+ * at which zone begins.
>+ */
>+ if (pfn_to_nid(kernel_endpfn) == nid)
>+ usable_startpfn = max(arch_startpfn, kernel_endpfn);
>+ else
>+ usable_startpfn = arch_startpfn;
>+
>+ /*
> * Recalculate kernelcore_node if the division per node
> * now exceeds what is necessary to satisfy the requested
> * amount of memory for the kernel
>--
>2.13.6
>
>
>