Re: [PATCH] perf maps: Process kcore maps in order

From: Adrian Hunter
Date: Mon May 06 2024 - 01:46:35 EST


On 6/05/24 08:43, Adrian Hunter wrote:
> On 5/05/24 23:28, Leo Yan wrote:
>> On Arm64, after enabling the 'DEBUG=1' build option, the tool exits
>> abnormally:
>>
>> # ./perf report --itrace=Zi10ibT
>> # perf: util/maps.c:42: check_invariants: Assertion `map__end(prev) <= map__start(map) || map__start(prev) == map__start(map)' failed.
>> Aborted
>>
>> The details for causing this error are described in below.
>>
>> Firstly, machine__get_running_kernel_start() calculates the delta
>> between the '_stext' symbol and the '_edata' symbol for the kernel map,
>> alongside with eBPF maps:
>>
>> DSO | Start address | End address
>> -----------------+--------------------+--------------------
>> kernel.kallsyms 0xffff800080000000 0xffff800082229200
>> bpf_prog 0xffff800082545aac 0xffff800082545c94
>> ...
>>
>> Then, the perf tool retrieves kcore maps:
>>
>> Kcore maps | Start address | End address
>> -----------------+--------------------+--------------------
>> kcore_text 0xffff800080000000 0xffff8000822f0000
>> vmalloc 0xffff800080000000 0xfffffdffbf800000
>> ...
>>
>> Finally, the function dso__load_kcore() extends the kernel maps based on
>> the retrieved kcore info. Since it uses the inverted order for
>> processing the kcore maps, it extends maps for the vmalloc region prior
>> to the 'kcore_text' map:
>>
>> DSO | Start address | End address
>> -----------------+--------------------+--------------------
>> kernel.kallsyms 0xffff800080000000 0xffff800082229200
>> kernel.kallsyms 0xffff800082229200 0xffff800082545aac -> Extended for vmalloc region
>> bpf_prog 0xffff800082545aac 0xffff800082545c94
>> ...
>>
>> DSO | Start address | End address
>> -----------------+--------------------+--------------------
>> kernel.kallsyms 0xffff800080000000 0xffff8000822f0000 -> Updated for kcore_text map
>> kernel.kallsyms 0xffff800082229200 0xffff800082545aac
>> bpf_prog 0xffff800082545aac 0xffff800082545c94
>> ...
>>
>> As result, the two maps [0xffff800080000000..0xffff8000822f0000) and
>> [0xffff800082229200..0xffff800082545aac) are overlapping and triggers
>> failure.
>>
>> The current code processes kcore maps in inverted order. To fix it, this
>> patch adds kcore maps in the tail of list, afterwards these maps will be
>> processed in the order. Therefore, the kernel text section can be
>> processed prior to handling the vmalloc region, which avoids using the
>> inaccurate kernel text size when extending maps with the vmalloc region.
>>
>> Signed-off-by: Leo Yan <leo.yan@xxxxxxx>
>> ---
>> tools/perf/util/symbol.c | 19 ++++++++++++++++++-
>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
>> index 9ebdb8e13c0b..e15d70845488 100644
>> --- a/tools/perf/util/symbol.c
>> +++ b/tools/perf/util/symbol.c
>> @@ -1266,7 +1266,24 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
>> map__set_end(list_node->map, map__start(list_node->map) + len);
>> map__set_pgoff(list_node->map, pgoff);
>>
>> - list_add(&list_node->node, &md->maps);
>> + /*
>> + * Kcore maps are ordered with:
>> + * [_text.._end): Kernel text section
>> + * [VMALLOC_START..VMALLOC_END): vmalloc
>> + * ...
>> + *
>> + * On Arm64, the '_text' and 'VMALLOC_START' are the same values
>> + * but VMALLOC_END (~124TiB) is much bigger then the text end
>> + * address. So '_text' region is the subset of the vmalloc region.
>> + *
>> + * Afterwards, when dso__load_kcore() adjusts kernel maps, we must
>> + * process the kernel text size prior to handling vmalloc region.
>> + * This can avoid to using any inaccurate kernel text size when
>> + * extending maps with vmalloc region. For this reason, here it
>> + * always adds kcore maps to the tail of list to make sure the
>> + * sequential handling is in order.
>> + */
>> + list_add_tail(&list_node->node, &md->maps);
>
> This seems reasonable, but I wonder if it might be robust
> and future proof to also process the main map first
> e.g. totally untested:
>
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 9ebdb8e13c0b..63bce45a5abb 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1365,16 +1365,15 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
> if (!replacement_map)
> replacement_map = list_entry(md.maps.next, struct map_list_node, node)->map;
>
> - /* Add new maps */
> + /* Add replacement_map */
> while (!list_empty(&md.maps)) {

That will need to be:

struct map_list_node *new_node;
..
list_for_each_entry(new_node, &md.maps, node) {


> struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
> struct map *new_map = new_node->map;
>
> - list_del_init(&new_node->node);
> -
> if (RC_CHK_EQUAL(new_map, replacement_map)) {
> struct map *map_ref;
>
> + list_del_init(&new_node->node);
> map__set_start(map, map__start(new_map));
> map__set_end(map, map__end(new_map));
> map__set_pgoff(map, map__pgoff(new_map));
> @@ -1385,20 +1384,29 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
> err = maps__insert(kmaps, map_ref);
> map__put(map_ref);
> map__put(new_map);
> + free(new_node);
> if (err)
> goto out_err;
> - } else {
> - /*
> - * Merge kcore map into existing maps,
> - * and ensure that current maps (eBPF)
> - * stay intact.
> - */
> - if (maps__merge_in(kmaps, new_map)) {
> - err = -EINVAL;
> - goto out_err;
> - }
> }
> + }
> +
> + /* Add new maps */
> + while (!list_empty(&md.maps)) {
> + struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
> + struct map *new_map = new_node->map;
> +
> + list_del_init(&new_node->node);
> +
> + /*
> + * Merge kcore map into existing maps,
> + * and ensure that current maps (eBPF)
> + * stay intact.
> + */
> + if (maps__merge_in(kmaps, new_map))
> + err = -EINVAL;
> free(new_node);
> + if (err)
> + goto out_err;
> }
>
> if (machine__is(machine, "x86_64")) {
>
>