Re: [PATCH bpf-next v3 10/14] bpf: Add bitwise atomic instructions

From: Brendan Jackman
Date: Mon Dec 07 2020 - 11:15:12 EST


On Mon, Dec 07, 2020 at 07:58:09AM -0800, Yonghong Song wrote:
>
>
> On 12/7/20 3:28 AM, Brendan Jackman wrote:
> > On Fri, Dec 04, 2020 at 07:21:22AM -0800, Yonghong Song wrote:
> > >
> > >
> > > On 12/4/20 1:36 AM, Brendan Jackman wrote:
> > > > On Thu, Dec 03, 2020 at 10:42:19PM -0800, Yonghong Song wrote:
> > > > >
> > > > >
> > > > > On 12/3/20 8:02 AM, Brendan Jackman wrote:
> > > > > > This adds instructions for
> > > > > >
> > > > > > atomic[64]_[fetch_]and
> > > > > > atomic[64]_[fetch_]or
> > > > > > atomic[64]_[fetch_]xor
> > > > > >
> > > > > > All these operations are isomorphic enough to implement with the same
> > > > > > verifier, interpreter, and x86 JIT code, hence being a single commit.
> > > > > >
> > > > > > The main interesting thing here is that x86 doesn't directly support
> > > > > > the fetch_ version these operations, so we need to generate a CMPXCHG
> > > > > > loop in the JIT. This requires the use of two temporary registers,
> > > > > > IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.
> > > > > >
> > > > > > Change-Id: I340b10cecebea8cb8a52e3606010cde547a10ed4
> > > > > > Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx>
> > > > > > ---
> > > > > > arch/x86/net/bpf_jit_comp.c | 50 +++++++++++++++++++++++++++++-
> > > > > > include/linux/filter.h | 60 ++++++++++++++++++++++++++++++++++++
> > > > > > kernel/bpf/core.c | 5 ++-
> > > > > > kernel/bpf/disasm.c | 21 ++++++++++---
> > > > > > kernel/bpf/verifier.c | 6 ++++
> > > > > > tools/include/linux/filter.h | 60 ++++++++++++++++++++++++++++++++++++
> > > > > > 6 files changed, 196 insertions(+), 6 deletions(-)
> > > > > >
> > > > [...]
> > > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > > > > index 6186280715ed..698f82897b0d 100644
> > > > > > --- a/include/linux/filter.h
> > > > > > +++ b/include/linux/filter.h
> > > > > > @@ -280,6 +280,66 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
> > > > [...]
> > > > > > +#define BPF_ATOMIC_FETCH_XOR(SIZE, DST, SRC, OFF) \
> > > > > > + ((struct bpf_insn) { \
> > > > > > + .code = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \
> > > > > > + .dst_reg = DST, \
> > > > > > + .src_reg = SRC, \
> > > > > > + .off = OFF, \
> > > > > > + .imm = BPF_XOR | BPF_FETCH })
> > > > > > +
> > > > > > /* Atomic exchange, src_reg = atomic_xchg((dst_reg + off), src_reg) */
> > > > >
> > > > > Looks like BPF_ATOMIC_XOR/OR/AND/... all similar to each other.
> > > > > The same is for BPF_ATOMIC_FETCH_XOR/OR/AND/...
> > > > >
> > > > > I am wondering whether it makes sence to have to
> > > > > BPF_ATOMIC_BOP(BOP, SIZE, DST, SRC, OFF) and
> > > > > BPF_ATOMIC_FETCH_BOP(BOP, SIZE, DST, SRC, OFF)
> > > > > can have less number of macros?
> > > >
> > > > Hmm yeah I think that's probably a good idea, it would be consistent
> > > > with the macros for non-atomic ALU ops.
> > > >
> > > > I don't think 'BOP' would be very clear though, 'ALU' might be more
> > > > obvious.
> > >
> > > BPF_ATOMIC_ALU and BPF_ATOMIC_FETCH_ALU indeed better.
> >
> > On second thoughts I think it feels right (i.e. it would be roughly
> > consistent with the level of abstraction of the rest of this macro API)
> > to go further and just have two macros BPF_ATOMIC64 and BPF_ATOMIC32:
> >
> > /*
> > * Atomic ALU ops:
> > *
> > * BPF_ADD *(uint *) (dst_reg + off16) += src_reg
> > * BPF_AND *(uint *) (dst_reg + off16) &= src_reg
> > * BPF_OR *(uint *) (dst_reg + off16) |= src_reg
> > * BPF_XOR *(uint *) (dst_reg + off16) ^= src_reg
>
> "uint *" => "size_type *"?
> and give an explanation that "size_type" is either "u32" or "u64"?

"uint *" is already used in the file so I'll follow the precedent there.

>
> > * BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg);
> > * BPF_AND | BPF_FETCH src_reg = atomic_fetch_and(dst_reg + off16, src_reg);
> > * BPF_OR | BPF_FETCH src_reg = atomic_fetch_or(dst_reg + off16, src_reg);
> > * BPF_XOR | BPF_FETCH src_reg = atomic_fetch_xor(dst_reg + off16, src_reg);
> > * BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg)
> > * BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg)
> > */
> >
> > #define BPF_ATOMIC64(OP, DST, SRC, OFF) \
> > ((struct bpf_insn) { \
> > .code = BPF_STX | BPF_DW | BPF_ATOMIC, \
> > .dst_reg = DST, \
> > .src_reg = SRC, \
> > .off = OFF, \
> > .imm = OP })
> >
> > #define BPF_ATOMIC32(OP, DST, SRC, OFF) \
> > ((struct bpf_insn) { \
> > .code = BPF_STX | BPF_W | BPF_ATOMIC, \
> > .dst_reg = DST, \
> > .src_reg = SRC, \
> > .off = OFF, \
> > .imm = OP })
>
> You could have
> BPF_ATOMIC(OP, SIZE, DST, SRC, OFF)
> where SIZE is BPF_DW or BPF_W.

Ah sorry, I didn't see this mail and have just posted v4 with the 2
separate macros. Let's see if anyone else has an opinion on
this point.

> >
> > The downside compared to what's currently in the patchset is that the
> > user can write e.g. BPF_ATOMIC64(BPF_SUB, BPF_REG_1, BPF_REG_2, 0) and
> > it will compile. On the other hand they'll get a pretty clear
> > "BPF_ATOMIC uses invalid atomic opcode 10" when they try to load the
> > prog, and the valid atomic ops are clearly listed in Documentation as
> > well as the comments here.
>
> This should be fine. As you mentioned, documentation has mentioned
> what is supported and what is not...