Re: [patch 1/2] x86_64 page fault NMI-safe

From: Mathieu Desnoyers
Date: Tue Aug 03 2010 - 17:16:53 EST

Next message: Jan Andres: "[PATCH] isofs: Fix lseek() to position beyond 4 GB"
Previous message: Rafael J. Wysocki: "Re: [PATCH] SATA / AHCI: Do not play with the link PM during suspend to RAM (was: Re: HDD not suspending properly / dead on resume)"
In reply to: Ingo Molnar: "Re: [patch 1/2] x86_64 page fault NMI-safe"
Next in thread: Mathieu Desnoyers: "Re: [patch 1/2] x86_64 page fault NMI-safe"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Ingo Molnar (mingo@xxxxxxx) wrote:
>
> * Ingo Molnar <mingo@xxxxxxx> wrote:
>
> >
> > * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > > On Tue, Aug 3, 2010 at 12:45 PM, Mathieu Desnoyers
> > > <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> > > >
> > > > The real issue here, IMHO, is that Perf has tied gory ring buffer
> > > > implementation details to the userspace perf ABI, and there is now strong
> > > > unwillingness from Perf developers to break this ABI.
> >
> > (Wrong.)

I am glad to hear this. So should I understand that if we show that the current
perf ABI imposes significant design constraints and results in poor performance
and inability to support flight recorder mode (which is needed to unify the ring
buffers), we can deprecate the ABI ?

[...]

> > We may want to add things like a NOP event to pad out the end of page

Or simply write the page (or sub-buffer) size information in a page (or
sub-buffer) header. The gain here is that by doing so we don't have to reserve
an event ID for the NOP event, which adds one extra ID reserved in _each_ event
header. You might be tempted to say "oh, it's just a single value, who cares ?",
but with the amount of data we're moving, being able to represent the event
header on a very small amount of bits really makes a difference. Bloat creeps in
one single bit at a time until we start not caring about adding whole integers,
and when we're there the game was over long ago: performance suffer deeply.

The huge size of the perf event headers is another factor that might explain its
poor performance by the way.

[...]

> [ The control structure of the mmap area is there for performance/wakeup
> optimizations

I am doubtful about an "optimization" that affects what should be a slow path:
user-space wakeup for delivering a multiple events at once. Have you checked if
this leads to actual noticeable performance increase at all ?

> (and to allow the kernel to lose information on producer
> overload, while still giving user-space an idea that we lost data and how
> much)

This can be performed with a standard system call rather than playing games
with a shared pages into which both the kernel and user-space write. The
advantage is that by letting user-space calling the kernel (rather than just
writing "I'm done" in that page by updating the consumer value), we can let the
kernel perform tasks that might enable us to implement flight recorder mode all
within the same ring buffer implementation.

> - it does not affect semantics and does not limit us. ]

Well, so far, the main limitation I can see is that it does not allow us to do
flight recorder tracing (a.k.a. overwrite mode).

>
> So there's no design limitation - Peter simply prefers one possible solution
> over another and outlined his reasons - we should hash that out based on the
> technical arguments.

Another argument I've seen from Peter is that he prefers the perf
kernel-userspace interaction to happen through this shared page to diminish the
number of traced events generated by perf activity. But I find this argument
unconvincing, because it really only applies to system call tracing: the rest of
tracing will be affected by the perf user-space process activity. So we might as
well just bite the bullet and accept that the trace is "polluted" by user-space
perf events. It _is_ using up CPU time anyway, so I think it's actually _better_
to know about it, rather than to try to hide the tracer activity. If one really
wants to filter out the tracer activity, it can be done at post-processing
without problem. But at least the information is there.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jan Andres: "[PATCH] isofs: Fix lseek() to position beyond 4 GB"
Previous message: Rafael J. Wysocki: "Re: [PATCH] SATA / AHCI: Do not play with the link PM during suspend to RAM (was: Re: HDD not suspending properly / dead on resume)"
In reply to: Ingo Molnar: "Re: [patch 1/2] x86_64 page fault NMI-safe"
Next in thread: Mathieu Desnoyers: "Re: [patch 1/2] x86_64 page fault NMI-safe"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]