Re: [PATCH bpf-next v4 1/6] bpf: Factorize bpf_trace_printk and bpf_seq_printf

From: Florent Revest
Date: Thu Apr 15 2021 - 05:33:04 EST


On Thu, Apr 15, 2021 at 2:38 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
> On Wed, Apr 14, 2021 at 11:54 AM Florent Revest <revest@xxxxxxxxxxxx> wrote:
> > +static int try_get_fmt_tmp_buf(char **tmp_buf)
> > +{
> > + struct bpf_printf_buf *bufs;
> > + int used;
> > +
> > + if (*tmp_buf)
> > + return 0;
> > +
> > + preempt_disable();
> > + used = this_cpu_inc_return(bpf_printf_buf_used);
> > + if (WARN_ON_ONCE(used > 1)) {
> > + this_cpu_dec(bpf_printf_buf_used);
>
> this makes me uncomfortable. If used > 1, you won't preempt_enable()
> here, but you'll decrease count. Then later bpf_printf_cleanup() will
> be called (inside bpf_printf_prepare()) and will further decrease
> count (which it didn't increase, so it's a mess now).

Awkward, yes. :( This code is untested because it only covers a niche
preempt_rt usecase that is hard to reproduce but I should have thought
harder about these corner cases.

> > + i += 2;
> > + if (!final_args)
> > + goto fmt_next;
> > +
> > + if (try_get_fmt_tmp_buf(&tmp_buf)) {
> > + err = -EBUSY;
> > + goto out;
>
> this probably should bypass doing bpf_printf_cleanup() and
> try_get_fmt_tmp_buf() should enable preemption internally on error.

Yes. I'll fix this and spend some more brain cycles thinking about
what I'm doing. ;)

> > -static __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...)
> > +BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
> > + u64, arg2, u64, arg3)
> > {
> > + u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 };
> > + enum bpf_printf_mod_type mod[MAX_TRACE_PRINTK_VARARGS];
> > static char buf[BPF_TRACE_PRINTK_SIZE];
> > unsigned long flags;
> > - va_list ap;
> > int ret;
> >
> > - raw_spin_lock_irqsave(&trace_printk_lock, flags);
> > - va_start(ap, fmt);
> > - ret = vsnprintf(buf, sizeof(buf), fmt, ap);
> > - va_end(ap);
> > - /* vsnprintf() will not append null for zero-length strings */
> > + ret = bpf_printf_prepare(fmt, fmt_size, args, args, mod,
> > + MAX_TRACE_PRINTK_VARARGS);
> > + if (ret < 0)
> > + return ret;
> > +
> > + ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, mod),
> > + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod));
> > + /* snprintf() will not append null for zero-length strings */
> > if (ret == 0)
> > buf[0] = '\0';
> > +
> > + raw_spin_lock_irqsave(&trace_printk_lock, flags);
> > trace_bpf_trace_printk(buf);
> > raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
> >
> > - return ret;
>
> see here, no + 1 :(

I wonder if it's a bug or a feature though. The helper documentation
says the helper returns "the number of bytes written to the buffer". I
am not familiar with the internals of trace_printk but if the
terminating \0 is not outputted in the trace_printk buffer, then it
kind of makes sense.

Also, if anyone uses this return value, I can imagine that the usecase
would be if (ret == 0) assume_nothing_was_written(). And if we
suddenly output 1 here, we might break something.

Because the helper is quite old, maybe we should improve the helper
documentation instead? Your call :)