Re: [PATCHv4 2/2] arm64/io: Add a header for mmio access instrumentation

From: Sai Prakash Ranjan
Date: Mon Nov 29 2021 - 08:52:05 EST


Hi Arnd,

On 11/22/2021 9:13 PM, Sai Prakash Ranjan wrote:
On 11/22/2021 9:05 PM, Arnd Bergmann wrote:
On Mon, Nov 22, 2021 at 3:59 PM Sai Prakash Ranjan
<quic_saipraka@xxxxxxxxxxx> wrote:
And if we do move this instrumentation to asm-generic/io.h, how will
that be executed since
the arch specifc read{b,w,l,q} overrides this generic version?
As I understand it, your version also requires architecture specific
changes, so that would be the same: it only works for architectures
that get the definition of readl()/readl_relaxed()/inl()/... from
include/asm-generic/io.h and only override the __raw version. Arnd
Sorry, I didn't get this part, so  I am trying this on ARM64:

arm64/include/asm/io.h has read{b,l,w,q} defined.
include/asm-generic/io.h has below:
    #ifndef readl
    #define readl readl
    static inline u32 readl(const volatile void __iomem *addr)

and we include asm-generic/io.h in arm64/include/asm/io.h at the end
after the definitions for arm64 mmio accesors.
So arch implementation here overrides generic ones as I see it, am I
missing something? I even confirmed this
with some trace_printk to generic and arch specific definitions of readl
and I see arch specific ones being called.
Ah, you are right that the arm64 version currently has custom definitions
of the high-level interfaces. These predate the introduction of the
__io_{p,}{b,a}{r,w} macros and are currently only used on risc-v.

I think in this case you should start by changing arm64 to use the
generic readl() etc definitions, by removing the extra definitions and
using

#define __io_ar(v) __iormb(__v)
#define __io_bw() dma_wmb()



Sure, will do that.

I got the callback version implemented as suggested by you to compare the overall size and performance
with the inline version and apparently the size increased more in case of callback version when compared to
inline version. As for the performance, I ran some basic dd tests and sysbench and didn't see any noticeable
difference between the two implementations.

**Inline version with CONFIG_FTRACE=y and CONFIG_TRACE_MMIO_ACCESS=y**
$ size vmlinux
   text                   data                    bss dec             hex         filename
23884219        14284468         532568 38701255        24e88c7 vmlinux

**Callback version with CONFIG_FTRACE=y and CONFIG_TRACE_MMIO_ACCESS=y**
$ size vmlinux
   text                  data                     bss dec                      hex        filename
24108179        14279596         532568 38920343        251e097 vmlinux

$ ./scripts/bloat-o-meter inline-vmlinux callback-vmlinux
add/remove: 8/3 grow/shrink: 4889/89 up/down: 242244/-11564 (230680)
Total: Before=25812612, After=26043292, chg +0.89%

Note: I had arm64 move to asm-generic high level accessors in both versions which I plan to post together but
not included in below links,

For your reference, here are the 2 versions of the patches,
 Inline version - https://github.com/saiprakash-ranjan/RTB-Patches/blob/main/0001-asm-generic-io-Add-logging-support-for-MMIO-accessor.patch
 Callback version - https://github.com/saiprakash-ranjan/RTB-Patches/blob/main/0001-asm-generic-io-Add-callback-based-MMIO-logging-suppo.patch

Couple of things noted in callback version which didn't look quite good was that it needed some way to register the
callbacks and need to use some initcall (core_initcall used) as implemented in above patch but that would probably
mean we would lose some register logging(if there is some) in between early_initcall(which is when trace events get
enabled) and this trace_readwrite core_initcall. Another thing I noticed that since we now move to callback based
implementations, the caller info is somewhat inaccurate when compared to inline version.

Also regarding your earlier comment on inline versions being possibly problematic on lower memory systems, enabling
Ftrace itself is making a considerable size difference and in such systems ftrace wouldn't be enabled which implicitly means
MMIO logging which are based on trace events will be disabled anyways.

Here is the size delta with FTRACE enabled and disabled with arm64 defconfig without MMIO logging support:
$ ./scripts/bloat-o-meter ftrace-disabled-vmlinux ftrace-enabled-vmlinux
add/remove: 8/3 grow/shrink: 4889/89 up/down: 242244/-11564 (230680)
Total: Before=22865837, After=25215198, chg +10.27%

Given all this, I would prefer inline versions in asm-generic high level accessors, let me know what you think.

Thanks,
Sai