While implementing IPv6 sticky opts I took a look at the UDP
socket locking.
The locking done at the high-level socket layer (net/socket.c) is
not complete I think. sockfd_lookup() calls fget() which increases
file->f_count. That is enough to prevent the socket from going away,
but allows more than one clone to use the socket at one time. This
opens many races in the socket layer, even in the non-SMP case
(with SMP it'll get even worse). For example consider the case of
one thread calling connect, while the other calls sendmsg.
BTW with more stuff moving into the dentries - how about getting rid
of the socket inode and putting a socket pointer directly into the
dentry?
The IPv4 UDP send path calls lock_sock() while building the packet.
I think that is completely useless: the only other parts that check
for this lock are the socket destruction timer and the packet input
bottom half. For the packet input path it's very bad, because collecting
the user packet data might take quite some time (e.g. when it's swapped
out) and it can't process packets while the socket is locked. The
sock_readers lock doesn't help at all against other threads changing
the state of the socket, because the socket layer never checks it, but
only sets it.
The current situation is also:
- socket layer for one socket might be called by many threads, but is
not fully reentrant. This leads to many races between code that sets
certain socket options and the send code that uses it (I hit this
problem with my sticky option code).
- There is only one socket lock, that blocks input processing while
data is sent - that limits parallelism for multithreaded servers.
I propose a new lock for every socket:
- One reader/writer lock that protects socket state that is used by
the data send path. This will prevent many of the clone-races in the
socket layer.
- Use sock_readers only for the input processing.
Comments? I think this cleanup is really needed, especially when
the code is moved outside of the SMP global kernel lock.
-Andi