Re: [PATCH RFC] sched: add notifier for process migration

From: Jeremy Fitzhardinge
Date: Wed Oct 14 2009 - 12:17:36 EST


On 10/14/09 00:05, Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:
>
>
>> @@ -1981,6 +1989,12 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>> #endif
>> perf_swcounter_event(PERF_COUNT_SW_CPU_MIGRATIONS,
>> 1, 1, NULL, 0);
>> +
>> + tmn.task = p;
>> + tmn.from_cpu = old_cpu;
>> + tmn.to_cpu = new_cpu;
>> +
>> + atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
>>
> We already have one event notifier there - look at the
> perf_swcounter_event() callback. Why add a second one for essentially
> the same thing?
>
> We should only put a single callback there - a tracepoint defined via
> TRACE_EVENT() - and any secondary users can register a callback to the
> tracepoint itself.
>
> There's many similar places in the kernel - with notifier chains and
> also with a need to get tracepoints there. The fastest (and most
> consistent) solution is to add just a single event callback facility.
>

My specific use case for this notifier is to provide a "you've been
migrated" counter to usermode via a fixmap page, as part of the work to
extend kernel/pvclock.c to implement vread for vsyscall use. I probably
should have referred to that explicitly in the comment for the patch to
give a concrete motivation and rationale.

This means that on applicable systems - ie, running virtualized under
Xen or KVM - this will be something that will be installed early in boot
and called for the entire uptime of the system. Since we don't want a
strong permanent coupling between that particular piece of
arch-independent scheduler code and an arch-specific piece of
functionality, it seemed like a notifier is a good fit.

(Note that this callback is generally useful on all systems for the
vgetcpu vsyscall; it would allow us to use the "tcache" parameter to
provide results which are both fast and 100% accurate, by deferring the
use of expensive lsl/rdtscp instructions until it *knows* the cpu has
changed.)

I tend to view the intent of tracepoints as more a diagnostic tool which
are inserted and removed dynamically as a way of instrumenting a running
system, and the tracepoints themselves don't have side-effects required
for correct running of the system.

More handwavingly, I see the semantics of a tracepoint is basically a
flag-fall showing that a particular piece of kernel code has been
called, whereas notifications are that a particular event has occurred
(which may not be associated with any specific piece of code being
executed). This notion of "task X has been migrated from cpu A to B"
seems like a fairly high-level concept; the fact that it can be
implemented by hooking a single piece of code is side-effect of the
modularity of the scheduler rather than anything relating to the event
itself.

Functionally, tracepoints and notifiers do have broad similarities.
Should they be unified? I don't know, but they do seem to serve
distinct roles.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/