Re: RCU + SMR for preemptive kernel/user threads.

From: Joe Seigh
Date: Wed May 11 2005 - 17:47:06 EST


On Wed, 11 May 2005 08:04:54 -0700, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:

On Tue, May 10, 2005 at 06:40:20PM -0400, Joe Seigh wrote:


In classic RCU, the release is supplied by the context switch. In your
scheme, couldn't you do the following on the update side?

1. Gather up all the hazard pointers.
2. Send IPIs to all other CPUs.
3. Check the hazard pointers gathered in #1 against the
blocks to be freed.

You need to do the IPIs before you look at the hazard pointers.


The read side would do the following when letting go of a hazard pointer:

1. Prevent the compiler from reordering memory references
(the CPU would still be free to do so).
2. Set the hazard pointer to NULL.
3. begin non-critical-section code.

Checking where the IPI is received by the read side:

1. Before this point, the updater would have seen the non-NULL hazard
pointer (if the hazard pointer referenced the data item that was
previously removed).
2. Ditto.
3. Before this point, the hazard pointer could be seen as NULL, but
the read-side CPU will also have stopped using the pointer (since
we are assuming precise interrupts).

The problem is you don't know when the hazard pointer was set to NULL.
It could have been set soon after the IPI interrupt was received and
any outstanding accesses made since the IPI interrupt aren't syncronized
with respect to setting the hazard pointer to null.

But if you looked at the hazard pointer in the IPI interrupt handler,
you could use that information to decide whether you had to wait an
additional RCU interval. So updater logic would be

1. Set global pointer to NULL. // make object unreachable
2. Send IPIs to all other CPUs
(IPI interrupt handler will copy CPU's hazard pointers)
3. Check objects to be freed against copied hazard pointers.
4. There is no step 4. Even if the actual hazard pointers
that pointed to the object is NULL by this point (but not
its copy), you'd still have to wait and addtional RCU
interval so you might as well leave it out as redundant.

This is better. I may try that trick I used to make NPTL condvars
faster to see if I can keep Linux user space version of this from
tanking. It uses unix signals instead of IPIs.




Again, not sure if all CPUs support precise interrupts. The ones that I
am familiar with do, at least for IPIs.

Additionally if you replace any non NULL hazard pointer value you will
need to use
release semantics.

The trick is that the IPI provides the release semantics, but only
when needed. Right?

There might be something you can do to avoid the extra RCU wait but
I'd have to study it a little more to get a better handle on the
trade offs.

True, there will need to be either two RCU waits or two rounds of IPIs.

Yes, it might better be called RCU+SMR+RCU in that case.



I suppose I should do some kind of formal analysis of this. I'm figuring
out
if this technique is interesting enough first before I go through all that
work.

Hard to say without some experimentation.


I've done plenty of that. I have some atomically thread-safe reference
counting impletations and a proxy GC based on those which I compare to
an RCU for user threads implementation. Using lock-free in user space
gives you much more dramatic performance improvements than in the kernel.
It cuts down on unnecessary context switching which can slow things down
considerably. Also mutexes and rwlocks are prone to starvation. Making
them FIFO for guaranteed service order slows them down even further.

I usually use a semaphore to keep the updaters from running out of resources.
It slows down updater throughput but then I'm more concerned with reader
throughput. If I want faster recovery of resources I'll used the atomic
refcounted pointer or the proxy based on it. Slightly slower updater performance
but resources are recovered more quickly.

--
Joe Seigh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/