Re: [PATCH 3/3] perf record: mmap output file

From: David Ahern
Date: Tue Oct 15 2013 - 10:04:34 EST

Next message: Roger Quadros: "Re: [PATCH 2/7] usb: dwc3: adapt dwc3 core to use Generic PHY Framework"
Previous message: Felipe Balbi: "Re: [PATCH 2/7] usb: dwc3: adapt dwc3 core to use Generic PHYFramework"
In reply to: Ingo Molnar: "Re: [PATCH 3/3] perf record: mmap output file"
Next in thread: Arnaldo Carvalho de Melo: "Re: [PATCH 3/3] perf record: mmap output file"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/8/13 11:59 PM, Ingo Molnar wrote:

Here are some thoughts on how 'perf record' tracing performance could be
further improved:

1)

The use of non-temporal stores (MOVNTQ) to copy the ring-buffer into the
file buffer makes sure the CPU cache is not trashed by the copying - which
is the largest 'collateral damage' copying does.

glibc does not appear to expose non-temporal instructions so it's going to
be architecture dependent - but we could build the copy_user_nocache()
function from the kernel proper (or copy it - we could even simplify it:
knowing that only large and page aligned buffers are going to be copied
with it).

See how tools/perf/bench/mem-mem* does that to be able to measure the
kernel's memcpy() and memset() function performance.

Forgot about this suggestion as well. Added to the list for v3.

2)

Yet another method would be to avoid the copies altogether via the splice
system-call - see:

git grep splice kernel/trace/

To make splice low-overhead we'd have to introduce a mode to not mmap the
data part of the perf ring-buffer and splice the data straight from the
perf fd into a temporary pipe and over from the pipe into the target file
(or socket).

I looked into splice and it was not clear it would be a good match. First, perf is setup to pull data from mmap's and there is not a 1:1 association between mmap's and fd's (fd_in for splice). Second and more importantly, splice is also a system call and it would have to be invoked for each mmap each trip through the loop -- just like write() does today -- so it does not solve the feedback loop problem.

OTOH non-temporal stores are incredibly simple and memory bandwidth is
plenty on modern systems so I'd certainly try that route first.

I'll take a look.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Roger Quadros: "Re: [PATCH 2/7] usb: dwc3: adapt dwc3 core to use Generic PHY Framework"
Previous message: Felipe Balbi: "Re: [PATCH 2/7] usb: dwc3: adapt dwc3 core to use Generic PHYFramework"
In reply to: Ingo Molnar: "Re: [PATCH 3/3] perf record: mmap output file"
Next in thread: Arnaldo Carvalho de Melo: "Re: [PATCH 3/3] perf record: mmap output file"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]