Re: [PATCH] perf: use standard syscall tracepoint structs for augmentation

From: Namhyung Kim
Date: Fri Aug 08 2025 - 18:22:06 EST


On Thu, Aug 07, 2025 at 05:16:07PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Aug 06, 2025 at 04:07:32PM -0700, Namhyung Kim wrote:
> > Hello,
> >
> > On Wed, Aug 06, 2025 at 03:00:17PM +0200, Jakub Brnak wrote:
> > > Replace custom syscall structs with the standard trace_event_raw_sys_enter
> > > and trace_event_raw_sys_exit from vmlinux.h.
> > > This fixes a data structure misalignment issue discovered on RHEL-9, which
> > > prevented BPF programs from correctly accessing syscall arguments.
> >
> > Can you explain what the alignment issue was? It's not clear to me what
> > makes it misaligned.
>
> Yeah, mentioning a "misalignment" and then not spelling it out precisely
> doesn't help.
>
> Showing the pahole output of the expected structure layout in both
> kernels and what was being used would help us to understand the problem.
>
> For instance, here I have:
>
> acme@x1:~/git/bpf-next$ uname -r
> 6.15.5-200.fc42.x86_64
> acme@x1:~/git/bpf-next$ pahole -E trace_event_raw_sys_enter
> struct trace_event_raw_sys_enter {
> struct trace_entry {
> short unsigned int type; /* 0 2 */
> unsigned char flags; /* 2 1 */
> unsigned char preempt_count; /* 3 1 */
> int pid; /* 4 4 */
> } ent; /* 0 8 */
> long int id; /* 8 8 */
> long unsigned int args[6]; /* 16 48 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> char __data[]; /* 64 0 */
>
> /* size: 64, cachelines: 1, members: 4 */
> };
>
> acme@x1:~/git/bpf-next$
>
> And:
>
> ⬢ [acme@toolbx linux]$ pahole -C syscall_enter_args /tmp/build/linux/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o
> struct syscall_enter_args {
> unsigned long long common_tp_fields; /* 0 8 */
> long syscall_nr; /* 8 8 */
> unsigned long args[6]; /* 16 48 */
>
> /* size: 64, cachelines: 1, members: 3 */
> };
>
> ⬢ [acme@toolbx linux]$
>
> So yes, it is "aligned", the 'id' is the 'syscall_nr' and both are at
> offset 8, then we have the syscall args starting at offset 16 in both
> cases.
>
> The layout for rhel9 then we see the issue, the id, syscall_nr, is at
> offset 16, there is the misalignment:
>
> sh-5.1# pahole -E -C trace_event_raw_sys_enter /usr/lib/debug/lib/modules/5.14.0-570.32.1.el9_6.x86_64/vmlinux
> struct trace_event_raw_sys_enter {
> struct trace_entry {
> short unsigned int type; /* 0 2 */
> unsigned char flags; /* 2 1 */
> unsigned char preempt_count; /* 3 1 */
> int pid; /* 4 4 */
> unsigned char preempt_lazy_count; /* 8 1 */

Oh.. RHEL9 has this new field.


> } ent; /* 0 12 */
>
> /* XXX last struct has 3 bytes of padding */
> /* XXX 4 bytes hole, try to pack */
>
> long int id; /* 16 8 */
> long unsigned int args[6]; /* 24 48 */
> /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> char __data[]; /* 72 0 */
>
> /* size: 72, cachelines: 2, members: 4 */
> /* sum members: 68, holes: 1, sum holes: 4 */
> /* paddings: 1, sum paddings: 3 */
> /* last cacheline: 8 bytes */
> };
>
> sh-5.1#
>
> So if we always use 'struct trace_event_raw_sys_enter' from the
> vmlinux.h generated from the BTF info and have it all as CO-RE enabled
> (preserving the access index of fields, etc) it will work on any kernel
> you install on machine.

Agreed, thanks.
Namhyung