Re: atomic RAM ?

From: Arnd Bergmann
Date: Thu Apr 08 2010 - 10:15:31 EST


On Thursday 08 April 2010, Michael Schnell wrote:
> On 04/08/2010 02:14 PM, David Miller wrote:
> > Using the spinlock array idea also doesn't work in userspace
> > because any signal handler that tries to do an atomic on the
> > same object will deadlock on the spinlock.
> >
> Yep. I was beeing afraid of signal issues when thinking about this stuff
> (on and off for several months :) ), too.
>
> That is why I finally think that a completely hardware based solution
> for each necessary atomic operation is necessary, as well to do Futex
> (if not using said "atomic region" workaround for non-SMP), as to do SMP.

One really expensive but safe way to do atomic operations is to always
have them done on one CPU only, and provide a mechanism for other CPUs
to ask for an atomic operation using an inter-processor-interrupt.

> I finally think that this might be possible in a decent way with custom
> instructions using a - say - 1K Word internal FPGA memory space. But
> this might need some changes in the non-arch dependent Kernel and/or
> library code as the atomic macros would work on "handles" instead of
> pointers (of course these handles would be the old pointers with
> "normal" archs) and the words used by the macros would need to be
> explicitly allocated and deallocated instead of potentially being just
> static variables - even though the "atomic_allocate" macro would just
> create a static variable for "normal archs" and return it's address.

Why can't you do a hash by memory address for this?

I would guess you can define an instruction to atomically set and check
a bit in a shared array of implementation-specific size, by passing
a token in that by convention is the memory address you want to lock.

Given two priviledged instructions

/* returns one if we got the lock, zero if someone else holds it */
bool hashlock_addr(volatile void *addr);
void hashunlock_addr(volatile void *addr);

you can do

int atomic_add_return(int i, atomic_t *v)
{
int temp;

while (!hashlock_addr(v))
;
smp_rmb();
temp = v->counter;
temp += i;
v->counter = temp;
smp_wmb();
hashunlock_addr(v);
}

static inline unsigned long __cmpxchg(volatile unsigned long *m,
unsigned long old, unsigned long new)
{
unsigned long retval;
unsigned long flags;

while (!hashlock_addr(m))
;
smp_rmb()
retval = *m;
if (retval == old) {
*m = new;
smp_wmb();
}
hashunlock_addr(m);
return retval;
}

Anything else you can build on top of these two, including the system calls
that are used from user applications. Since you never hold that bit lock for
more than a few cycles, you could do with much less than 1K bits, in theory
a single global mutex (ignoring the address entirely) would be enough.

That said, a real load-locked/store-conditional would be much more powerful,
in particular because it can also be used from user space, and it is typically
more efficient because it uses the same mechanisms as the cache coherency
protocol.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/