Re: [PATCH 04/13] tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()

From: Steven Rostedt
Date: Thu Mar 16 2023 - 09:57:06 EST


On Thu, 16 Mar 2023 09:16:37 +0100
Uladzislau Rezki <urezki@xxxxxxxxx> wrote:

> > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > index ef8ed3b65d05..e6037752dcf0 100644
> > --- a/kernel/trace/trace_probe.h
> > +++ b/kernel/trace/trace_probe.h
> > @@ -256,6 +256,7 @@ struct trace_probe {
> > struct event_file_link {
> > struct trace_event_file *file;
> > struct list_head list;
> > + struct rcu_head rcu;
> > };
> >
> > static inline bool trace_probe_test_flag(struct trace_probe *tp,
> >
> struct foo_a {
> int a;
> int b;
> };

Most machines today are 64 bits, even low end machines.

struct foo_a {
long long a;
long long b;
};

is more accurate. That's 16 bytes.

Although it is more likely off because list_head is a double pointer. But
let's just go with this, as the amount really doesn't matter here.

>
> your obj size is 8 byte
>
> struct foo_b {
> struct rcu_head rcu;

Isn't rcu_head defined as;

struct callback_head {
struct callback_head *next;
void (*func)(struct callback_head *head);
} __attribute__((aligned(sizeof(void *))));
#define rcu_head callback_head

Which makes it 8 not 16 on 32 bit as well?

> int a;
> int b;
> };

So it should be 8 + 8 = 16, on 32 bit and 16 + 16 = 32 on 64bit.

>
> now it becomes 16 + 8 = 24 bytes. In reallity a foo_b object
> will be 32 bytes since there is no slab for 24 bytes:
>
> <snip>
> kmalloc-32 19840 19840 32 128 1 : tunables 0 0 0 : slabdata 155 155 0
> kmalloc-16 28857 28928 16 256 1 : tunables 0 0 0 : slabdata 113 113 0
> kmalloc-8 37376 37376 8 512 1 : tunables 0 0 0 : slabdata 73 73 0
> <snip>
>
> if we allocate 512 objects of foo_a it would be 4096 bytes
> in case of foo_b it is 24 * 512 = 12228 bytes.

This is for probe events. We usually allocate 1, maybe 2. Oh, some may even
allocate 100 to be crazy. But each probe event is in reality much larger
(1K perhaps) as each one allocates dentry's, inodes, etc. So 8 or 16 bytes
extra is still lost in the noise.

>
> single argument will give you 4096 + 512 * 8 = 8192 bytes
> int terms of memory consumtion.

If someone allocate 512 instances, that would be closer to a meg in size
without this change. 8k is probably less than 1%

>
> And double argument will not give you better performance comparing
> with a single argument.

It will, because it will no longer have to allocate anything if need be.
Note, when it doesn't allocate the system is probably mostly idle and we
don't care about performance, but when it needs allocation, that's likely a
time when performance is a bit more important.

-- Steve