Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

From: Daniel Borkmann
Date: Thu May 06 2021 - 17:38:43 EST


On 5/6/21 10:17 PM, Florent Revest wrote:
On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@xxxxxxxxxxxx> wrote:
On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@xxxxxxxxxxxx> wrote:

The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
per-cpu buffer that they use to store temporary data (arguments to
bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
by the end of their scope with bpf_bprintf_cleanup.

If one of these helpers gets called within the scope of one of these
helpers, for example: a first bpf program gets called, uses

Can we afford having few struct bpf_printf_bufs? They are just 512
bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
only situation where this can occur, right? If someone is doing
bpf_snprintf() and interrupt occurs and we run another BPF program, it
will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
second BPF program, etc. We can't eliminate the probability, but
having a small stack of buffers would make the probability so
miniscule as to not worry about it at all.

Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
the changes are minimal. Nestedness property is preserved for
non-sleepable BPF programs, right? If we want this to work for
sleepable we'd need to either: 1) disable migration or 2) instead of

oh wait, we already disable migration for sleepable BPF progs, so it
should be good to do nestedness level only

actually, migrate_disable() might not be enough. Unless it is
impossible for some reason I miss, worst case it could be that two
sleepable programs (A and B) can be intermixed on the same CPU: A
starts&sleeps - B starts&sleeps - A continues&returns - B continues
and nestedness doesn't work anymore. So something like "reserving a
slot" would work better.

Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?

assuming a stack of buffers, do a loop to find unused one. Should be
acceptable performance-wise, as it's not the fastest code anyway
(printf'ing in general).

In any case, re-using the same buffer for sort-of-optional-to-work
bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
suboptimal, so seems worth fixing this.

Thoughts?

Yes, agree, it would otherwise be really hard to debug. I had the same
thought on why not allowing nesting here given users very likely expect
these helpers to just work for all the contexts.

Thanks,
Daniel

What would you think of just letting the helpers own these 512 bytes
buffers as local variables on their stacks ? Then bpf_prepare_bprintf
would only need to write there, there would be no acquire semantic
(like try_get_fmt_tmp_buf) and the stack frame would just be freed on
the helper return so there would be no bpf_printf_cleanup either. We
would also not pre-reserve static memory for all CPUs and it becomes
trivial to handle re-entrant helper calls.

I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
code but I've not been convinced of its necessity.

I got the impression that extra 512 bytes on the kernel stack is quite
a lot and that's why we have per-cpu buffers. Especially that
bpf_trace_printk() can be called from any context, including NMI.

Ok, I understand.

What about having one buffer per helper, synchronized with a spinlock?
Actually, bpf_trace_printk already has that, not for the bprintf
arguments but for the bprintf output so this wouldn't change much to
the performance of the helpers anyway:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385

These helpers are not performance sensitive so a per-cpu stack of
buffers feels over-engineered to me (and is also complexity I feel a
bit uncomfortable with).

But wouldn't this have same potential of causing a deadlock? Simple example
would be if you have a tracing prog attached to bstr_printf(), and one of
the other helpers using the same lock called from a non-tracing prog. If
it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii
mentioned earlier. We've had few prior examples with similar issues [0].

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99