Re: MM deadlock [was: Re: arca-vm-8...]

Linus Torvalds (torvalds@transmeta.com)
Mon, 11 Jan 1999 10:10:35 -0800 (PST)


On Mon, 11 Jan 1999, MOLNAR Ingo wrote:
>
> ps. not that it counts too much, but my embarrasing email brought me to
> really closely check all the semaphore code again, and it seems to be
> all really cool and accurate. I've 'unfolded' the code into pseudocode
> by hand:

The not cool part about my implementation of recursive semaphores is:

CPU #1 CPU#2

down()
..
up()

"owner" is P#1, count=1

down() down()
lock ; decl succeeds

lock ; decl fails
goto slow part
slow part sees stale owner
returns success

"owner" is P#2

BOOM! - both CPU's are now inside the critical region.

The problem is that "owner" can be stale, and can be updated outside the
semaphore spinlock by a successful down() - which means that there are no
synchronizing primitives there to guarantee that the slow part sees the
new owner.

Now, setting the new owner is done in the very next instruction after
getting the semaphore, so CPU#1 has to be really fast, and CPU#2 has to be
really slow for the above to happen. But it's still possible, and CPU#2
taking an interrupt could cause that to happen.

And I don't see any way of getting rid of it without another spinlock. I
_could_ possibly do it with something like

up(sem) {
if (!sem.count)
sem.owner = 0;
old_up();
}

but I can't convince myself that that always works either.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/