Well, if it's a port of that, then it won't have the 2-instruction
fast-path that is pretty much required, imho.
I'll see if I can get a free afternoon some day and try to port the
current x86 semaphore code over to a rw version too. The plan was
something like this:
lock ; incl mem
lock ; decl mem
lock ; btsl $31,mem
lock ; andl $0x7fffffff,mem
where all the three contention cases grab a "contention spinlock" before
they then start sorting things out. The only interesting part is making
sure that the contention case gets the wakeups, and the above counts on:
- if a writer is waiting for readers (contention_wr), then the writer
will have already set the high bit, and a reader will know to wake it
up because the rw-semaphore value will be negative when it does
- if a reader is waiting for a writer, then the reader will have
incremented the semaphore, and the writer will know to wake it up
becasue the semaphore value won't be zero after the "write_up()".
- if a writer is waiting for another writer (contention_ww case), it will
have to increment the "reader" part of the semaphore value, in order to
get the other writer to wake it up on "write_up()".
All other races should be trivially handled by just having the spinlock,
so the only really hard cases are the fast-path stuff where we cannot get
the semaphore because it is too expensive.
Does anybody see any holes in the above pseudo-implementation? Please take
a look at the way the current x86 semaphores are implemented: they use
exactly the above kinds of single-atomic-instruction-plus-condition-codes
trickery to get the non-contention case without _any_ extra instructions.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
Please read the FAQ at http://www.tux.org/lkml/