Re: Regarding _skb_refdst memory alloc/dealloc

From: Chris Packham
Date: Tue May 03 2022 - 01:14:20 EST


+ Dave and Wen

On 3/05/22 15:10, Lokesh Dhoundiyal wrote:

> Hi,
>
> I have the tunnel destination entry set via skb_dst_set inside
> ip_tunnel_rcv. I wish to release the memory referenced by
> skb->_skb_refdst after use.
>
> Could you please advise the api to use for it. I am assuming that it is
> skb_dst_drop, Is that correct?

A bit more context. We've been seeing a memory leak that seems to have
appeared when we updated our Linux kernel from v4.4.16 to v5.7.19. The
test scenario involves learning OSPF routes over a tunnel. I don't
imagine there's anything particularly special about OSFP just that it
uses multicast traffic to communicate.

Some debugging pointed us at the kmalloc-256 slab and kmemleak seemed to
confirm the suspicion.

unreferenced object 0x8000000044beb900 (size 256):
  comm "softirq", pid 0, jiffies 4294984455 (age 35.980s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 80 00 00 00 05 13 74 80 ..............t.
    80 00 00 00 04 9b bf f9 00 00 00 00 00 00 00 00 ................
  backtrace:
    [<00000000f83947e0>] __kmalloc+0x1e8/0x300
    [<00000000b7ed8dca>] metadata_dst_alloc+0x24/0x58
    [<0000000081d32c20>] __ipgre_rcv+0x100/0x2b8
    [<00000000824f6cf1>] gre_rcv+0x178/0x540
    [<00000000ccd4e162>] gre_rcv+0x7c/0xd8
    [<00000000c024b148>] ip_protocol_deliver_rcu+0x124/0x350
    [<000000006a483377>] ip_local_deliver_finish+0x54/0x68
    [<00000000d9271b3a>] ip_local_deliver+0x128/0x168
    [<00000000bd4968ae>] xfrm_trans_reinject+0xb8/0xf8
    [<0000000071672a19>] tasklet_action_common.isra.16+0xc4/0x1b0
    [<0000000062e9c336>] __do_softirq+0x1fc/0x3e0
    [<00000000013d7914>] irq_exit+0xc4/0xe0
    [<00000000a4d73e90>] plat_irq_dispatch+0x7c/0x108
    [<000000000751eb8e>] handle_int+0x16c/0x178
    [<00000000a0c43b3e>] put_object+0x20/0xd8
    [<000000009439acbb>] scan_gray_list+0x18c/0x268

It appears that the leak is due to commit c0d59da79534 ("ip_gre: Make
none-tun-dst gre tunnel store tunnel info as metadat_dst in recv").
Prior to c0d59da79534 we'd only allocate a new dst if tunnel->collect_md
were true but now we'll also allocate one if tnl_params->daddr == 0.
When ip_route_input_mc() is eventually called it will call skb_dst_set()
leaking whatever is in skb->_skb_refdst.

A naive fix would be to call skb_dst_drop() in ip_route_input_mc() just
before calling skb_dst_set() (hence Lokesh's question) but I'm worried
we've missed something. I can't rule out that this has already been
fixed or is due to other changes in our kernel fork. I can't see
anything that says "Fixes: c0d59da79534" so if it has been fixed
c0d59da79534 doesn't appear to have been noted as the culprit. I've
asked Lokesh to try and reproduce the problem with the latest kernel so
we can rule out any changes we've made and confirm that the leak still
exists.

I wanted to get this out now just in case it rings any bells or if
someone has got a tunnel+multicast setup that might show the problem.

Thanks,
Chris