Re: [PATCH 1/2] perf/powerpc/hv-24x7: Use per-cpu page buffer

From: Sukadev Bhattiprolu
Date: Wed Dec 10 2014 - 17:29:44 EST


Michael Ellerman [mpe@xxxxxxxxxxxxxx] wrote:
| On Tue, 2014-12-09 at 23:06 -0800, Sukadev Bhattiprolu wrote:
| > From 470c16c8955672103a9529c78dffbb239e9e27b8 Mon Sep 17 00:00:00 2001
| > From: Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx>
| > Date: Tue, 9 Dec 2014 22:17:46 -0500
| > Subject: [PATCH 1/2] perf/poweprc/hv-24x7: Use per-cpu page buffer
| >
| > diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
| > index dba3408..18e1f49 100644
| > --- a/arch/powerpc/perf/hv-24x7.c
| > +++ b/arch/powerpc/perf/hv-24x7.c
| > @@ -217,11 +217,14 @@ static bool is_physical_domain(int domain)
| > domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CORE;
| > }
| >
| > +DEFINE_PER_CPU(char, hv_24x7_reqb[4096]);
| > +DEFINE_PER_CPU(char, hv_24x7_resb[4096]);
|
| Do we need it to be 4K aligned also? I would guess so.

Yes, fixed in the patch below.
|
| Rather than declaring these as char arrays and then casting below, can you pull
| the struct definitions up and then declare the per cpu variables with the
| proper type.

Well, the structures, used for communication with HV, have variable length
arrays, like:

struct hv_24x7_request_buffer {
...
struct hv_24x7_request requests[];
};

i.e the buffer needs to be larger than reported by sizeof(). So we
allocate a large buffer and cast it. Not sure if there is a trick to
get DEFINE_PER_CPU() to do that. We could add code to allocate pages
per cpu during init, but that would mean more code, error handling etc.

|
|
| > static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix,
| > u16 lpar, u64 *res,
| > bool success_expected)
| > {
| > - unsigned long ret = -ENOMEM;
| > + unsigned long ret;
| >
| > /*
| > * request_buffer and result_buffer are not required to be 4k aligned,
| > @@ -243,13 +246,11 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix,
| > BUILD_BUG_ON(sizeof(*request_buffer) > 4096);
| > BUILD_BUG_ON(sizeof(*result_buffer) > 4096);
| >
| > - request_buffer = kmem_cache_zalloc(hv_page_cache, GFP_USER);
| > - if (!request_buffer)
| > - goto out;
| > + request_buffer = (void *)get_cpu_var(hv_24x7_reqb);
| > + result_buffer = (void *)get_cpu_var(hv_24x7_resb);
| >
| > - result_buffer = kmem_cache_zalloc(hv_page_cache, GFP_USER);
| > - if (!result_buffer)
| > - goto out_free_request_buffer;
| > + memset(request_buffer, 0, 4096);
| > + memset(result_buffer, 0, 4096);
|
| Do we have to memset them? That's not going to speed things up.

I agree about the speed, specially since we have a larger buffer. But we
are reusing the buffer for independent events and some fields need to be 0
(hence the zalloc in the current code).

---