Re: net: heap out-of-bounds in fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone

From: Dmitry Vyukov
Date: Mon Mar 06 2017 - 13:52:07 EST


On Mon, Mar 6, 2017 at 6:31 PM, David Ahern <dsa@xxxxxxxxxxxxxxxxxxx> wrote:
> On 3/4/17 1:15 PM, Eric Dumazet wrote:
>> On Sat, 2017-03-04 at 19:57 +0100, Dmitry Vyukov wrote:
>>> On Fri, Mar 3, 2017 at 8:12 PM, David Ahern <dsa@xxxxxxxxxxxxxxxxxxx> wrote:
>>>> On 3/3/17 6:39 AM, Dmitry Vyukov wrote:
>>>>> I am getting heap out-of-bounds reports in
>>>>> fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone while running
>>>>> syzkaller fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. They all
>>>>> follow the same pattern: an object of size 216 is allocated from
>>>>> ip_dst_cache slab, and then accessed at offset 272/276 withing
>>>>> fib6_walk. Looks like type confusion. Unfortunately this is not
>>>>> reproducible.
>>>>
>>>> I'll take a look this weekend or Monday at the latest.
>>>
>>>
>>> I've got some additional useful info on this. I think this is
>>> use-after-free rather than out-of-bounds. I've collected stack where
>>> the route was disposed with call_rcu, see the last "Disposed" stack.
>>> The crash happens when cmpxchg in rt_cache_route replaces an existing
>>> route. And that route seems to have some existing pointers to it
>>> (rt->dst.rt6_next) which fib6_walk uses to get to it after its
>>> deletion.
>>
>> rt_cache_route() deals with IPv4 routes.
>>
>> We somehow mix IPv4 and IPv6 dsts in IPv6 tree.
>>
>> We need to add type safety at IPV6 route insertions to catch the
>> offender.
>>
>
> I've seen something like this before -- a rt was on the gc list but
> still linked in the tables because of some reference.
>
> Dmitry: you seem to have reproduced this a few times. Can you share how
> to run whatever tests you are using?


We hit it several thousand times, but we get only several dozens of
crashes per day on ~80 VMs. So if you try to reproduce it on a single
machine it can take days for a single crash.
If you are ready to go that route, here are some instructions on
setting up syzkaller:
https://github.com/google/syzkaller
You also need kernel built with CONFIG_KASAN.
I am ready to help with resolving any issues.

Another possible route is if you give me a patch with some additional
WARNINGs. Then I can deploy it to bots and collect stacks.