Re: [PATCH] perf maps: Process kcore maps in order

From: Adrian Hunter
Date: Mon May 06 2024 - 01:43:18 EST


On 5/05/24 23:28, Leo Yan wrote:
> On Arm64, after enabling the 'DEBUG=1' build option, the tool exits
> abnormally:
>
> # ./perf report --itrace=Zi10ibT
> # perf: util/maps.c:42: check_invariants: Assertion `map__end(prev) <= map__start(map) || map__start(prev) == map__start(map)' failed.
> Aborted
>
> The details for causing this error are described in below.
>
> Firstly, machine__get_running_kernel_start() calculates the delta
> between the '_stext' symbol and the '_edata' symbol for the kernel map,
> alongside with eBPF maps:
>
> DSO | Start address | End address
> -----------------+--------------------+--------------------
> kernel.kallsyms 0xffff800080000000 0xffff800082229200
> bpf_prog 0xffff800082545aac 0xffff800082545c94
> ...
>
> Then, the perf tool retrieves kcore maps:
>
> Kcore maps | Start address | End address
> -----------------+--------------------+--------------------
> kcore_text 0xffff800080000000 0xffff8000822f0000
> vmalloc 0xffff800080000000 0xfffffdffbf800000
> ...
>
> Finally, the function dso__load_kcore() extends the kernel maps based on
> the retrieved kcore info. Since it uses the inverted order for
> processing the kcore maps, it extends maps for the vmalloc region prior
> to the 'kcore_text' map:
>
> DSO | Start address | End address
> -----------------+--------------------+--------------------
> kernel.kallsyms 0xffff800080000000 0xffff800082229200
> kernel.kallsyms 0xffff800082229200 0xffff800082545aac -> Extended for vmalloc region
> bpf_prog 0xffff800082545aac 0xffff800082545c94
> ...
>
> DSO | Start address | End address
> -----------------+--------------------+--------------------
> kernel.kallsyms 0xffff800080000000 0xffff8000822f0000 -> Updated for kcore_text map
> kernel.kallsyms 0xffff800082229200 0xffff800082545aac
> bpf_prog 0xffff800082545aac 0xffff800082545c94
> ...
>
> As result, the two maps [0xffff800080000000..0xffff8000822f0000) and
> [0xffff800082229200..0xffff800082545aac) are overlapping and triggers
> failure.
>
> The current code processes kcore maps in inverted order. To fix it, this
> patch adds kcore maps in the tail of list, afterwards these maps will be
> processed in the order. Therefore, the kernel text section can be
> processed prior to handling the vmalloc region, which avoids using the
> inaccurate kernel text size when extending maps with the vmalloc region.
>
> Signed-off-by: Leo Yan <leo.yan@xxxxxxx>
> ---
> tools/perf/util/symbol.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 9ebdb8e13c0b..e15d70845488 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1266,7 +1266,24 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
> map__set_end(list_node->map, map__start(list_node->map) + len);
> map__set_pgoff(list_node->map, pgoff);
>
> - list_add(&list_node->node, &md->maps);
> + /*
> + * Kcore maps are ordered with:
> + * [_text.._end): Kernel text section
> + * [VMALLOC_START..VMALLOC_END): vmalloc
> + * ...
> + *
> + * On Arm64, the '_text' and 'VMALLOC_START' are the same values
> + * but VMALLOC_END (~124TiB) is much bigger then the text end
> + * address. So '_text' region is the subset of the vmalloc region.
> + *
> + * Afterwards, when dso__load_kcore() adjusts kernel maps, we must
> + * process the kernel text size prior to handling vmalloc region.
> + * This can avoid to using any inaccurate kernel text size when
> + * extending maps with vmalloc region. For this reason, here it
> + * always adds kcore maps to the tail of list to make sure the
> + * sequential handling is in order.
> + */
> + list_add_tail(&list_node->node, &md->maps);

This seems reasonable, but I wonder if it might be robust
and future proof to also process the main map first
e.g. totally untested:

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 9ebdb8e13c0b..63bce45a5abb 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1365,16 +1365,15 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
if (!replacement_map)
replacement_map = list_entry(md.maps.next, struct map_list_node, node)->map;

- /* Add new maps */
+ /* Add replacement_map */
while (!list_empty(&md.maps)) {
struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
struct map *new_map = new_node->map;

- list_del_init(&new_node->node);
-
if (RC_CHK_EQUAL(new_map, replacement_map)) {
struct map *map_ref;

+ list_del_init(&new_node->node);
map__set_start(map, map__start(new_map));
map__set_end(map, map__end(new_map));
map__set_pgoff(map, map__pgoff(new_map));
@@ -1385,20 +1384,29 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
err = maps__insert(kmaps, map_ref);
map__put(map_ref);
map__put(new_map);
+ free(new_node);
if (err)
goto out_err;
- } else {
- /*
- * Merge kcore map into existing maps,
- * and ensure that current maps (eBPF)
- * stay intact.
- */
- if (maps__merge_in(kmaps, new_map)) {
- err = -EINVAL;
- goto out_err;
- }
}
+ }
+
+ /* Add new maps */
+ while (!list_empty(&md.maps)) {
+ struct map_list_node *new_node = list_entry(md.maps.next, struct map_list_node, node);
+ struct map *new_map = new_node->map;
+
+ list_del_init(&new_node->node);
+
+ /*
+ * Merge kcore map into existing maps,
+ * and ensure that current maps (eBPF)
+ * stay intact.
+ */
+ if (maps__merge_in(kmaps, new_map))
+ err = -EINVAL;
free(new_node);
+ if (err)
+ goto out_err;
}

if (machine__is(machine, "x86_64")) {