Re: [PATCH 4/6] perf annotate-data: Check memory access with two registers

From: Namhyung Kim
Date: Thu May 02 2024 - 14:15:17 EST


On Thu, May 2, 2024 at 7:05 AM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
>
> On Wed, May 01, 2024 at 11:00:09PM -0700, Namhyung Kim wrote:
> > The following instruction pattern is used to access a global variable.
> >
> > mov $0x231c0, %rax
> > movsql %edi, %rcx
> > mov -0x7dc94ae0(,%rcx,8), %rcx
> > cmpl $0x0, 0xa60(%rcx,%rax,1) <<<--- here
> >
> > The first instruction set the address of the per-cpu variable (here, it
> > is 'runqueus' of struct rq). The second instruction seems like a cpu
>
> You mean 'runqueues', i.e. this one:
>
> kernel/sched/core.c
> DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
>
> ?

Right, sorry for the typo.

>
> But that 0xa60 would be in an alignment hole, at least in:
>
> $ pahole --hex rq | egrep 0xa40 -A12
> struct mm_struct * prev_mm; /* 0xa40 0x8 */
> unsigned int clock_update_flags; /* 0xa48 0x4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> u64 clock; /* 0xa50 0x8 */
>
> /* XXX 40 bytes hole, try to pack */
>
> /* --- cacheline 42 boundary (2688 bytes) --- */
> u64 clock_task __attribute__((__aligned__(64))); /* 0xa80 0x8 */
> u64 clock_pelt; /* 0xa88 0x8 */
> long unsigned int lost_idle_time; /* 0xa90 0x8 */
> $ uname -a
> Linux toolbox 6.7.11-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 27 16:50:39 UTC 2024 x86_64 GNU/Linux
> $

This would be different on kernel version, config and
other changes like backports or local modifications.

On my system, it was cpu_stop_work.arg.

$ pahole --hex rq | grep 0xa40 -C1
/* --- cacheline 41 boundary (2624 bytes) --- */
struct cpu_stop_work active_balance_work; /* 0xa40 0x30 */
int cpu; /* 0xa70 0x4 */

$ pahole --hex cpu_stop_work
struct cpu_stop_work {
struct list_head list; /* 0 0x10 */
cpu_stop_fn_t fn; /* 0x10 0x8 */
long unsigned int caller; /* 0x18 0x8 */
void * arg; /* 0x20 0x8 */
struct cpu_stop_done * done; /* 0x28 0x8 */

/* size: 48, cachelines: 1, members: 5 */
/* last cacheline: 48 bytes */
};


>
> The paragraph then reads:
>
> ----
> The first instruction set the address of the per-cpu variable (here, it
> is 'runqueues' of type 'struct rq'). The second instruction seems like
> a cpu number of the per-cpu base. The third instruction get the base
> offset of per-cpu area for that cpu. The last instruction compares the
> value of the per-cpu variable at the offset of 0xa60.
> ----
>
> Ok?

Yep, looks good.

Thanks,
Namhyung