Re: [profile] amortize atomic hit count increments

From: William Lee Irwin III
Date: Tue Sep 14 2004 - 02:59:57 EST


William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
[...]

On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> A few comments which describe the design would be nice...

Okay, I'll add a few in another update. I suppose what's going on may
not be as obvious to everyone else even with the code in hand.


On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
>> + local_irq_save(flags);
>> + per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
>> + local_irq_restore(flags);
>> + put_cpu();
>> +}

On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> hm. Does an IPI handler need to disable local IRQs?

It's for exclusion from the timer interrupt. It looks like ia32 enters
the calls with interrupts disabled, so it's probably safe to assume
it's called with disabled interrupts for all architectures (or what
architectures don't are broken by other callers elsewhere). I'll send
out an update with the explicit interrupt disablement removed.


William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
>> + down(&profile_flip_mutex);
>> + j = per_cpu(cpu_profile_flip, smp_processor_id());

On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> Is this preempt-safe?

Yes. It's irrelevant which cpu's cpu_profile_flip is sampled. But
it's not cpu hotplug -safe, as the cpu may be offlined and the per-cpu
storage freed in the duration between calling smp_processor_id()
and dereferencing the offset from the start of the per-cpu area.
Disabling preemption while it's being sampled (no longer than that is
necessary) would repair it for cpu hotplug, as it would then have a
valid cpu (the one on which it's executing) while the flip state is
being sampled (it can't change because we own the semaphore, and won't
vary by cpu unless the on_each_cpu() is in flight, but we have to have
a valid cpu number to sample it). The cpucontrol semaphore would
be excessively heavyweight and we'd either have to conditionally
compile out the native semaphore for the cpu hotplug case or otherwise
acquire two semaphores in succession.

This raises an interesting question of how on earth for_each_online_cpu()
is handled by cpu hotplug, but I don't feel responsible for answering it.

So, my preferred fix is the following, with which I'll send out an
updated patch if everyone agrees:

Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 23:12:27.574463744 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 00:10:29.820081944 -0700
@@ -209,7 +209,8 @@
int i, j, cpu;

down(&profile_flip_mutex);
- j = per_cpu(cpu_profile_flip, smp_processor_id());
+ j = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
for_each_online_cpu(cpu) {
struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/