Re: Linux 5.12-rc7

From: Eric Dumazet
Date: Mon Apr 12 2021 - 13:38:58 EST


On Mon, Apr 12, 2021 at 7:31 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> On 4/12/21 9:31 AM, Eric Dumazet wrote:
> > On Mon, Apr 12, 2021 at 6:28 PM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> On Sun, Apr 11, 2021 at 10:14 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >>>
> >>> Qemu test results:
> >>> total: 460 pass: 459 fail: 1
> >>> Failed tests:
> >>> sh:rts7751r2dplus_defconfig:ata:net,virtio-net:rootfs
> >>>
> >>> The failure bisects to commit 0f6925b3e8da ("virtio_net: Do not pull payload in
> >>> skb->head"). It is a spurious problem - the test passes roughly every other
> >>> time. When the failure is seen, udhcpc fails to get an IP address and aborts
> >>> with SIGTERM. So far I have only seen this with the "sh" architecture.
> >>
> >> Hmm. Let's add in some more of the people involved in that commit, and
> >> also netdev.
> >>
> >> Nothing in there looks like it should have any interaction with
> >> architecture, so that "it happens on sh" sounds odd, but maybe it's
> >> some particular interaction with the qemu environment.
> >
> > Yes, maybe.
> >
> > I spent few hours on this, and suspect a buggy memcpy() implementation
> > on SH, but this was not conclusive.
> >
>
> I replaced all memcpy() calls in skbuff.h with calls to
>
> static inline void __my_memcpy(unsigned char *to, const unsigned char *from,
> unsigned int len)
> {
> while (len--)
> *to++ = *from++;
> }
>
> That made no difference, so unless you have some other memcpy() in mind that
> seems to be unlikely.


Sure, note I also had :

diff --git a/net/core/dev.c b/net/core/dev.c
index af8c1ea040b9364b076e2d72f04dc3de2d7e2f11..4e05a32542dd606aaaaee8038017fea949939c0e
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5938,7 +5938,7 @@ static void gro_pull_from_frag0(struct sk_buff
*skb, int grow)

BUG_ON(skb->end - skb->tail < grow);

- memcpy(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow);
+ memmove(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow);

skb->data_len -= grow;
skb->tail += grow;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c421c8f809256f7b13a8b5a1331108449353ee2d..41796dedfa9034f2333cf249a0d81b7250e14d1f
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2278,7 +2278,7 @@ int skb_copy_bits(const struct sk_buff *skb, int
offset, void *to, int len)
skb_frag_off(f) + offset - start,
copy, p, p_off, p_len, copied) {
vaddr = kmap_atomic(p);
- memcpy(to + copied, vaddr + p_off, p_len);
+ memmove(to + copied, vaddr + p_off, p_len);
kunmap_atomic(vaddr);
}


>
> > By pulling one extra byte, the problem goes away.
> >
> > Strange thing is that the udhcpc process does not go past sendto().
> >
>
> I have been trying to debug that one. Unfortunately gdb doesn't work with sh,
> so I can't use it to debug the problem. I'll spend some more time on this today.

Yes, I think this is the real issue here. This smells like some memory
corruption.

In my traces, packet is correctly received in AF_PACKET queue.

I have checked the skb is well formed.

But the user space seems to never call poll() and recvmsg() on this
af_packet socket.