On Thu, Jun 12, 2025 at 6:25 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:...>>>> +BTF_ID_FLAGS(func, bpf_io_uring_extract_next_cqe, KF_RET_NULL);
+BTF_KFUNCS_END(io_uring_kfunc_set)
This is not safe in general.
The verifier doesn't enforce argument safety here.
As a minimum you need to add KF_TRUSTED_ARGS flag to all kfunc.
And once you do that you'll see that the verifier
doesn't recognize the cqe returned from bpf_io_uring_get_cqe*()
as trusted.
Thanks, will add it. If I read it right, without the flag the
program can, for example, create a struct io_ring_ctx on stack,
fill it with nonsense and pass to kfuncs. Is that right?
No. The verifier will only allow a pointer to struct io_ring_ctx
to be passed, but it may not be fully trusted.
The verifier has 3 types of pointers to kernel structures:
1. ptr_to_btf_id
2. ptr_to_btf_id | trusted
3. ptr_to_btf_id | untrusted
1st was added long ago for tracing and gradually got adopted
for non-tracing needs, but it has a foot gun, since
all pointer walks keep ptr_to_btf_id type.
It's fine in some cases to follow pointers, but not in all.
Hence 2nd variant was added and there
foo->bar dereference needs to be explicitly allowed
instead of allowed by default like for 1st kind.
All loads through 1 and 3 are implemented as probe_read_kernel.
while loads from 2 are direct loads.
So kfuncs without KF_TRUSTED_ARGS with struct io_ring_ctx *ctx
argument are likely fine and safe, since it's impossible
to get this io_ring_ctx pointer by dereferencing some other pointer.
But better to tighten safety from the start.
We recommend KF_TRUSTED_ARGS for all kfuncs and
eventually it will be the default.
index 9494e4289605..400a06a74b5d 100644
--- a/io_uring/bpf.c
+++ b/io_uring/bpf.c
@@ -2,6 +2,7 @@
#include <linux/bpf_verifier.h>
#include "io_uring.h"
+#include "memmap.h"
#include "bpf.h"
#include "register.h"
@@ -72,6 +73,14 @@ struct io_uring_cqe *bpf_io_uring_extract_next_cqe(struct io_ring_ctx *ctx)
return cqe;
}
+__bpf_kfunc
+void *bpf_io_uring_get_region(struct io_ring_ctx *ctx, u64 size__retsz)
+{
+ if (size__retsz > ((u64)ctx->ring_region.nr_pages << PAGE_SHIFT))
+ return NULL;
+ return io_region_get_ptr(&ctx->ring_region);
+}
and bpf prog should be able to read/write anything in
[ctx->ring_region->ptr, ..ptr + size] region ?
Populating (creating) dynptr is probably better.
See bpf_dynptr_from*()
but what is the lifetime of that memory ?