Re: [PATCH v12 06/14] unwind_user/deferred: Add deferred unwinding interface

From: Steven Rostedt
Date: Wed Jul 02 2025 - 13:26:24 EST

Next message: Dragos Tatulea: "[RFC net-next 0/4] devmem/io_uring: Allow devices without parent PCI device"
Previous message: Jeff Johnson: "Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again"
In reply to: Linus Torvalds: "Re: [PATCH v12 06/14] unwind_user/deferred: Add deferred unwinding interface"
Next in thread: Steven Rostedt: "Re: [PATCH v12 06/14] unwind_user/deferred: Add deferred unwinding interface"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 2 Jul 2025 09:56:39 -0700
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Also, does it actually have to be entirely unique? IOW, a 32-bit
> counter (or even less) might be sufficient if there's some guarantee
> that processing happens before the counter wraps around? Again - for
> correlation purposes, just *how* many outstanding events can you have
> that aren't ordered by other things too?
>
> I'm sure people want to also get some kind of rough time idea, but
> don't most perf events have them simply because people want time
> information for _informatioal_ reasons, rather than to correlate two
> events?

And it only needs to be unique per thread per system call. The real
reason for this identifier is for lost events. As I explained in the
perf patchset, the issues is this:

In case of dropped events, we could have the case of:

system_call() {
<nmi> {
take kernel stack trace
ask for deferred trace.

[EVENTS START DROPPING HERE]
}
Call deferred callback to record trace [ BUT IS DROPPED ]
}

system_call() {
<nmi> {
take kernel stack trace
ask for deferred trace [ STILL DROPPING ]
}
[ READER CATCHES UP AND STARTS READING EVENTS AGAIN]

Call deferred callback to record trace
}

The user space tool will see that kernel stack traces of the first
system call, then it will see events dropped, and then it will see the
deferred user space stack trace of the second call.

The identifier is only there for uniqueness for that one thread to let
the tracer know if the deferred trace can be tied to events before it
lost them.

We figured a single 32 bit counter would be good enough when we first
discussed this idea, but we wanted per cpu counters to not have cache
contention every time a CPU wanted to increment the counter. But each
CPU would need an identifier so that a task migrating will not get the
same identifier for a different system call just because it migrated.

We used 16 bits for the CPU counter thinking that 32K of CPUs would
last some time in the future. We then chose to use a 64 bit number to
allow us to have 48 bits left for uniqueness which is plenty.

If we use 32 bits, that would give us 32K of unique systemcalls, and it
does seem possible that on a busy system, a tracer could lose 32K of
system calls before it gets going again. But we could still use it
anyway as the likelihood of losing exactly 32K of system calls and
starting tracing back up again will probably never happen. And if it
does, the worse thing that it will do is have the tracer mistake which
user space stack trace goes to which event. If your are tracing that
many events, this will likely be in the noise.

So I'm fine with making this a 32 bit counter using 16 bits for the CPU
and 16 bits for per thread uniqueness.

-- Steve

Next message: Dragos Tatulea: "[RFC net-next 0/4] devmem/io_uring: Allow devices without parent PCI device"
Previous message: Jeff Johnson: "Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again"
In reply to: Linus Torvalds: "Re: [PATCH v12 06/14] unwind_user/deferred: Add deferred unwinding interface"
Next in thread: Steven Rostedt: "Re: [PATCH v12 06/14] unwind_user/deferred: Add deferred unwinding interface"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]