From: Geliang Tang <tanggeliang@xxxxxxxxxx>
It's necessary to traverse all subflows on the conn_list of an MPTCP
socket and then call kfunc to modify the fields of each subflow. In
kernel space, mptcp_for_each_subflow() helper is used for this:
mptcp_for_each_subflow(msk, subflow)
kfunc(subflow);
But in the MPTCP BPF program, this has not yet been implemented. As
Martin suggested recently, this conn_list walking + modify-by-kfunc
usage fits the bpf_iter use case.
So this patch adds a new bpf_iter type named "mptcp_subflow" to do
this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
_destroy(). And register these bpf_iter mptcp_subflow into mptcp
common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
in BPF program like this:
bpf_for_each(mptcp_subflow, subflow, msk)
kfunc(subflow);
Suggested-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx>
Signed-off-by: Geliang Tang <tanggeliang@xxxxxxxxxx>
Reviewed-by: Mat Martineau <martineau@xxxxxxxxxx>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@xxxxxxxxxx>
---
Notes:
- v2:
- Add BUILD_BUG_ON() checks, similar to the ones done with other
bpf_iter_(...) helpers.
- Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and
!spin_is_locked() (Martin).
- v3:
- Switch parameter from 'struct mptcp_sock' to 'struct sock' (Martin)
- Remove unneeded !msk check (Martin)
- Remove locks checks, add msk_owned_by_me for lockdep (Martin)
- The following note and 2 questions have been added below.
This new bpf_iter will be used by our future BPF packet schedulers and
path managers. To see how we are going to use them, please check our
export branch [1], especially these two commits:
- "bpf: Add mptcp packet scheduler struct_ops": introduce a new
struct_ops.
- "selftests/bpf: Add bpf_burst scheduler & test": new test showing
how the new struct_ops and bpf_iter are being used.
[1] https://github.com/multipath-tcp/mptcp_net-next/commits/export
@BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter
to be used with struct_ops, but only with the two new ones we are going
to introduce that are specific to MPTCP, and with not others struct_ops
(TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do
you have examples or doc you could point to us to have this restriction
in place, please?
Also, for one of the two future MPTCP struct_ops, not all callbacks
should be allowed to use this new bpf_iter, because they are called from
different contexts. How can we ensure such callbacks from a struct_ops
cannot call mptcp_subflow bpf_iter without adding new dedicated checks
looking if some locks are held for all callbacks? We understood that
they wanted to have something similar with sched_ext, but we are not
sure if this code is ready nor if it is going to be accepted.