Re: KASAN: use-after-free Read in br_mdb_ip_get

From: Dmitry Vyukov
Date: Thu Feb 21 2019 - 05:54:57 EST


On Wed, Feb 20, 2019 at 11:23 AM Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Jan 28, 2019 at 09:28:36AM +0100, Dmitry Vyukov wrote:
> >
> > > Weird, this is the kfree() on the error path of br_multicast_new_group()
> > > when rhashtable_lookup_insert_fast() fails, which means the entry should
> > > not be linked in the rhashtable or the hlist.
> > > All other frees are via kfree_rcu. I'll keep looking..
> >
> > Humm.... +rhashtable.c maintianers
> >
> > The code in br_multicast_new_group is effectively:
> >
> > mp = kzalloc(sizeof(*mp), GFP_ATOMIC);
> > err = rhashtable_lookup_insert_fast(&br->mdb_hash_tbl, &mp->rhnode,
> > br_mdb_rht_params);
> > if (err)
> > kfree(mp);
> >
> > So it looks like rhashtable_lookup_insert_fast both returned an error
> > and left the new object linked in the table. Since it happened only
> > once, it may have something to do with concurrent resizing/shrinking.
>
> I've looked through the rhashtable code in question and everything
> looks OK. So I suspect some earlier corruption has occured to cause
> this anomalous result.


Taking into account that this still happened only once, I tend to
write it off onto a previous silent memory corruption (we have dozens
of known bugs that corrupt memory). So if several people already
looked at it and don't see the root cause, it's probably time to stop
spending time on this until we have more info.

Although, there was also this one:
https://groups.google.com/d/msg/syzkaller-bugs/QfCCSxdB1aM/y2cn9IZJCwAJ
I have not checked if it can be the root cause of this report, but it
points suspiciously close to this stack and when I looked at it, it
the report looked legit.


> Is it possible to collect earlier alloc/free stack traces on the object in question?

You mean even before the alloc/free of the current incarnation this
object? This looks challenging from memory consumption point of view.
Also how many of them will we print in reports? Also the page can go
through page_alloc and then tracking will be even more challenging.
And in the end the object can be corrupted by an out-of-bounds write
or a like. Heap object reuse quarantine should take care of the common
case, and I don't see a reasonable way to handle all possible corner
cases...