Re: [PATCH] move smp_rmb() after load of status

From: Willem de Bruijn
Date: Sun Jul 08 2018 - 17:53:49 EST


> @@ -394,25 +394,30 @@ static void __packet_set_status(struct packet_sock *po, void *frame, int status)
> static int __packet_get_status(struct packet_sock *po, void *frame)
> {
> union tpacket_uhdr h;
> -
> - smp_rmb();
> + int status;
>
> h.raw = frame;
> switch (po->tp_version) {
> case TPACKET_V1:
> flush_dcache_page(pgv_to_page(&h.h1->tp_status));
> - return h.h1->tp_status;
> + status = h.h1->tp_status;
> + break;
> case TPACKET_V2:
> flush_dcache_page(pgv_to_page(&h.h2->tp_status));
> - return h.h2->tp_status;
> + status = h.h2->tp_status;
> + break;
> case TPACKET_V3:
> flush_dcache_page(pgv_to_page(&h.h3->tp_status));
> - return h.h3->tp_status;
> + status = h.h3->tp_status;
> + break;
> default:
> WARN(1, "TPACKET version not supported.\n");
> BUG();
> - return 0;
> + status = 0;
> + break;
> }
> + smp_rmb();
> + return status;
> }
>
> static __u32 tpacket_get_timestamp(struct sk_buff *skb, struct timespec *ts,
> --
> 2.17.1
>
> Sorry for a lack of information- it is the first my patch.
>
> The opposie smp_wmb() is in: https://elixir.bootlin.com/linux/v4.15/source/net/packet/af_packet.c#L415

__packet_get_status in the common path synchronizes with userspace
code, not with __packet_set_status. See __v2_tx_user_ready in
tools/testing/selftests/net/psock_tpacket.c from an example.

On receive, the smp_wmb in tpacket_rcv ensures that a frame is filled
before the kernel updates the status field to TP_STATUS_USER. So that
is not the purpose of the smp_wmb in __packet_set_status.

Digging through the original PACKET_TX_RING discussion (below)
it appears that the intent of that smp_wmb was intended either to ensure
that the frame was fully written before freeing the skb (when called from
tpacket_destruct_skb) or even to ensure that the frame was fully written
before calling flush_dcache_page on the underlying shared page.

The latter would explain its analogue in __packet_get_status,
though it is not implemented as a barrier between the field access
and page flush in either function in the final patch.

>From http://lists.openwall.net/netdev/2008/10/31/70:

>> Also, when you set the status back to TP_STATUS_KERNEL in the
>> destructor, you need to add the following barriers:
>>
>> __packet_set_status(po, ph, TP_STATUS_KERNEL);
>> smp_mb(); // make sure the TP_STATUS_KERNEL was actually written to
>> memory before this - couldn't this actually be just a smp_wmb?
>> flush_dcache_page(virt_to_page(ph)); // on non-x86 architectures like
>> ARM that have a moronic cache (i.e cache by virtual rather than
>> physical address). on x86 this is a noop.
>>
>
> So, If my understanding of those memory barriers is correct, we should
> have a smp_rmb() before status reading and smp_wmb() after status
> writing in skb destructor and send procedure.

> On my eye smp_rmb() should be moved *after* actual read of status to be ensured that no read after __packet_get_status() happens before actual read of status.
>

That sounds reasonable. But we should understand the intent of the
existing code before making changes. That was not the purpose of this
barrier, it appears.