Re: [0/3] net: Kill skb_copy_datagram_const_iovec

From: Al Viro
Date: Tue Nov 04 2014 - 00:45:24 EST


On Mon, Nov 03, 2014 at 03:05:53PM -0500, David Miller wrote:

> I'll see if I can make some progress converting the networking over
> to iov_iter. It can't be that difficult... albeit perhaps a little
> time consuming.

FWIW, I have a queue that got started back in April; basically, the plan
of attack was
* separate kernel-side and userland msghdr.
* localize ->msg_iov uses - most of that gets taken care of by
several new helpers, as in
new helper: skb_copy_datagram_msg()

Absolute majority of skb_copy_datagram_iovec() callers (49 out of 56)
are passing it msg->msg_iov as iovec. Provide a trivial wrapper that
takes msg as argument instead of iovec.
and several like that (the numbers in the above are probably incorrect these
days - it was done more than half a year ago).
* switch kernel-side msghdr to iov_iter. That means diverging
layouts; it's really not hard, since we have copying of msghdr from
userland already localized. Initially - just a mechanical conversion
(i.e. direct uses of iov_iter fields instead of ->msg_iov/->msg_iovlen;
note that after the introduction of wrappers the number of such places
is very much reduced).
* start converting those relatively few places to iov_iter primitives.

And that's where it got stalled, since we have to deal with expectations
of callers. Syscall ones are trivial; that's not a problem. Unfortunately,
there are kernel_{send,recv}msg() users, and those do care about the state the
iovec is left in. Strictly speaking, the state of iovec after e.g.
->sendmsg() is undefined. And it's not just protocol-dependent - unless
I'm seriously misreading it, tcp_sendmsg() ends up modifying iovec in case
when it hits tcp_send_rcvq(), while in the normal case it leaves iovec
unmodified. So in general you need to feed ->{send,recv}msg() a throwaway copy
of iovec. Leads to wonders like
/* NB we can't trust socket ops to either consume our iovs
* or leave them alone. */
LASSERT (niov > 0);

for (nob = i = 0; i < niov; i++) {
scratchiov[i] = iov[i];
nob += scratchiov[i].iov_len;
}
LASSERT (nob <= conn->ksnc_rx_nob_wanted);

rc = kernel_recvmsg(conn->ksnc_sock, &msg,
(struct kvec *)scratchiov, niov, nob, MSG_DONTWAIT);
etc. However, there are places that don't bother and do this:
while (total_rx < data) {
rx_loop = kernel_recvmsg(conn->sock, &msg, iov_p, iov_len,
(data - total_rx), MSG_WAITALL);
if (rx_loop <= 0) {
pr_debug("rx_loop: %d total_rx: %d\n",
rx_loop, total_rx);
return rx_loop;
}
total_rx += rx_loop;
pr_debug("rx_loop: %d, total_rx: %d, data: %d\n",
rx_loop, total_rx, data);
}
(that's iscsit_do_rx_data()). Maybe it's a bug; maybe it's relying on
specific behaviour of the protocol known to be used - this code clearly
expects recvmsg to advance iovec, which seems to depend only on the
protocol. At the moment. In any case, it's very brittle...

Hell knows; I hadn't finished digging through that zoo - got sidetracked back
then. *IF* all such places either use a throwaway copy or assume that iovec
gets modified, we can do the following: switch the access to iovecs to
iov_iter primitives, with the first kind of callers creating a throwaway
iov_iter and the second just feeding the same iov_iter to e.g.
kernel_recvmsg(). iovec will remain constant, iov_iter will be advanced.
Moreover, in a lot of cases of first kind will be able to get rid of
throwaway iov_iter (and of manually advancing it), effectively converting
to the second one.

If we have places that currently rely on iovec remaining unchanged (i.e.
manually advancing it after kernel_{send,recv}msg()), the series will be
more painful ;-/ I very much hope that no such places exist...

FWIW, there is also a tactical question that needs to be dealt with. We
can, of course, start with renaming the "kernel-side" (i.e. post
copy_msghdr_from_user()/get_compat_msghdr()) to struct kmsghdr. OTOH,
that's a _lot_ of churn for very little reason - most of the instances
in the tree are of that kind. So I did it the other way round - introduced
struct user_msghdr (only in linux/socket.h; note that we do *not* have
struct msghdr in uabi/linux/socket.h, or anywhere else in uabi/*),
made the syscalls take pointers to it and (initially) rely upon the identical
layouts in copy_msghdr_from_user(); once we put iov_iter into kernel-side
msghdr, we'll just do it like get_compat_msghdr() does.

Is that acceptable? It would greatly reduce the amount of churn in net/* -
we don't need to pass iov_iter separately and most of the functions in
the middle of call chains are completely unchanged. Only the originators
of ->sendmsg()/->recvmsg() and the places doing actual data copying
need to be touched. OTOH, it makes for kernel struct msghdr looking
odd - instead of normal ->msg_iov and ->msg_iovlen it would have
->msg_iov_iter, with ->sendmsg()/->recvmsg() callers needing to set it
up... OTTH, the things *are* odd from userland programmer POV - sendmsg(2)
and recvmsg(2) leave the iovec unchanged, and having it changed unpredicatably
in the kernel-side counterparts seems to make for a nasty trap. Certainly
makes for a bunch of nasty comments in the code using those...

Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/