Re: [PATCH] fix race in AF_UNIX

From: Eric W. Biederman
Date: Thu Jun 21 2007 - 11:25:00 EST


Miklos Szeredi <miklos@xxxxxxxxxx> writes:

> [CC'd Al Viro and Alan Cox, restored patch]
>
>> > There are races involving the garbage collector, that can throw away
>> > perfectly good packets with AF_UNIX sockets in them.
>> >
>> > The problems arise when a socket goes from installed to in-flight or
>> > vice versa during garbage collection. Since gc is done with a
>> > spinlock held, this only shows up on SMP.
>> >
>> > Signed-off-by: Miklos Szeredi <mszeredi@xxxxxxx>
>>
>> I'm going to hold off on this one for now.
>>
>> Holding all of the read locks kind of defeats the purpose of using
>> the per-socket lock.
>>
>> Can't you just lock purely around the receive queue operation?
>
> That's already protected by the receive queue spinlock. The race
> however happens _between_ pushing the root set and marking of the
> in-flight but reachable sockets.
>
> If in that space any of the AF_UNIX sockets goes from in-flight to
> installed into a file descriptor, the garbage collector can miss it.
> If we want to protect against this using unix_sk(s)->readlock, then we
> have to hold all of them for the duration of the marking.
>
> Al, Alan, you have more experience with this piece of code. Do you
> have better ideas about how to fix this?

I haven't looked at the code closely enough to be confident of
changing something in this area. However the classic solution to this
kind of gc problem is to mark things that are manipulated during
garbage collection as dirty (not orphaned).

It should be possible to fix this problem by simply changing gc_tree
when we perform a problematic manipulation of a passed socket, such
as installing a passed socket into the file descriptors of a process.

Essentially the idea is moving the current code in the direction of
an incremental gc algorithm.


If I understand the race properly. What happens is that we dequeue
a socket (which has packets in it passing sockets) before the
garbage collector gets to it. Therefore the garbage collector
never processes that socket. So it sounds like we just
need to call maybe_unmark_and_push or possibly just wait for
the garbage collector to complete when we do that and the packet
we have pulled out



So just looking at this quickly. It looks like we need to hold
the u->readlock mutex in garbage.c while looking at the receive
queue. Otherwise we may be processing a packet and have
the file descriptors in limbo. Either that or we need
to slightly extend the scope of the receive queue lock on the
receive side.

Additionally we need to modify unix_notinflight to mark the sockets
as inuse, or to at least wait for the garbage collection to complete.
While being very careful to not add a deadlock scenario.

The require changes to fix this without adding heavy handed locking
look a bit nasty to make.

I half suspect switching to one of the simpler incremental garbage
collection algorithms might not be equally easy to implement and
give us more performance at the same time. Having a garbage collector
that can block has advantages.

The practical problem is you can't modify the list of pushed object
aka gc_tree without holding a lock. And we never drop the
unix_table_lock.


Anyway hopefully that is enough fodder for someone to come up with
a light weight fix for this problem.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/