Re: [PATCH RFC tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

From: Paul E. McKenney
Date: Tue Jul 29 2014 - 14:20:03 EST


On Tue, Jul 29, 2014 at 07:31:21PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 09:33:12AM -0700, Paul E. McKenney wrote:
> > > > o There are no memory barriers, but a synchronize_sched()
> > > > should take care of that, given that this counter is
> > > > incremented with interrupts disabled.
> > >
> > > Well, there's obviously the actual context switch, which should imply an
> > > actual MB such that tasks are self ordered even when execution continues
> > > on another cpu etc..
> >
> > True enough, except that it appears that the context switch happens
> > after the ->nvcsw increment, which means that it doesn't help RCU-tasks
> > guarantee that if it has seen the increment, then all prior processing
> > has completed. There might be enough stuff prior the increment, but I
> > don't see anything that I feel comfortable relying on. Am I missing
> > some ordering?
>
> There's the smp_mb__before_spinlock() + raw_spin_lock_irq(), but that's
> about it I suppose.

Which does only smp_wmb(), correct? :-(

> > > > So I should be able to snapshot the task_struct structure's ->nvcsw
> > > > field and avoid the added code in the fastpaths.
> > > >
> > > > Seem plausible, or am I confused about the role of ->nvcsw?
> > >
> > > Nope, that's the 'I scheduled to go to sleep' counter.
> >
> > I am assuming that the "Nope" goes with "am I confused" rather than
> > "Seem plausible" -- if not, please let me know. ;-)
>
> Yah :-)
>
> > > There is of course the 'polling' issue I raised in a further email...
> >
> > Yep, and other flavors of RCU go to lengths to avoid scanning the
> > task_struct lists. Steven said that updates will be rare and that it
> > is OK for them to have high latency and overhead. Thus far, I am taking
> > him at his word. ;-)
> >
> > I considered interrupting the task_struct polling loop periodically,
> > and would add that if needed. That said, this requires nailing down the
> > task_struct at which the vacation is taken. Here "nailing down" does not
> > simply mean "prevent from being freed", but rather "prevent from being
> > removed from the lists traversed by do_each_thread/while_each_thread."
> >
> > Of course, if there is some easy way of doing this, please let me know!
>
> Well, one reason I'm not liking this is because its in an anonymous
> context (kthread) which makes accounting for it hard.
>
> I feel we're doing far too much async stuff already and it keeps getting
> worse and worse. Ideally we'd be able to account every cycle of kernel
> 'overhead' to a specific user action.

Hmmm...

In theory, we could transfer the overhead of the kthread for a given grace
period to the task invoking the corresponding synchronize_rcu_tasks().
In practice, the overhead might need to be parceled out among several
tasks that concurrently invoked synchronize_rcu_tasks(). Or I suppose
that the overhead could be assigned to the first such task that woke
up, on the theory that things would even out over time.

So exactly how annoyed are you about the lack of accounting? ;-)

> Another reason is that I fundamentally dislike polling stuff.. but yes,
> I'm not really seeing how to do this differently, partly because I'm not
> entirely sure why we need this to begin with. I'm not sure what problem
> we're solving.

As I recall it...

Steven is working on some sort of tracing infrastructure that involves
dynamically allocated trampolines being inserted into some/all functions.
The trampoline code can be preempted, but never does voluntary context
switches, and presumably never calls anything that does voluntary
context switches.

Easy to insert a trampoline, but the trick is removing them.

The thought is to restore the instructions at the begining of the
function in question, wait for an RCU-tasks grace period, then dispose
of the trampoline.

Of course, you could imagine disabling preemption or otherwise entering
an RCU read-side critical section before transferring to the trampoline,
but this was apparently a no-go due to the overhead for small functions.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/