Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace

From: John Stultz
Date: Tue Aug 02 2016 - 13:04:23 EST


On Tue, Aug 2, 2016 at 9:37 AM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
> On Sun, Jul 31, 2016 at 6:42 PM, Lorenzo Colitti <lorenzo@xxxxxxxxxx> wrote:
>> On Sat, Jul 30, 2016 at 1:57 AM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
>>>
>>> With the patch reverted, and the system working, I see:
>>>
>>> # ip rule ls
>>> 0: from all lookup local
>>> 10000: from all fwmark 0xc0000/0xd0000 lookup legacy_system
>>> 13000: from all fwmark 0x10063/0x1ffff lookup local_network
>>> 13000: from all fwmark 0x10065/0x1ffff lookup wlan0
>>> 14000: from all oif wlan0 lookup wlan0
>>> 14000: from all oif wlan0 lookup wlan0
>>> 15000: from all fwmark 0x0/0x10000 lookup legacy_system
>>> 16000: from all fwmark 0x0/0x10000 lookup legacy_network
>>> 17000: from all fwmark 0x0/0x10000 lookup local_network
>>> 19000: from all fwmark 0x64/0x1ffff lookup wlan0
>>> 19000: from all fwmark 0x65/0x1ffff lookup wlan0
>>> 22000: from all fwmark 0x0/0xffff lookup wlan0
>>> 32000: from all unreachable
>>
>>
>> This is not correct, you're missing "uidrange 0-0" qualifiers on some
>> of the rules.
>>
>> Does the kernel pass the networking unit tests at
>> https://source.android.com/devices/tech/config/kernel_network_tests.html
>> ? If not, the Android network stack will not work correctly.
>
> So I looked into getting the tests above to run (had to get a UML fix
> for recent kernels).
>
> Against vanilla v4.7 (plus that one UML fix), all the tests pass.
>
> Against linus/master (plus that one UML fix), I see 7 failures, which
> all look very similar:
>
> Traceback (most recent call last):
> File "./multinetwork_test.py", line 28, in <module>
> import multinetwork_base
> File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/multinetwork_base.py",
> line 84, in <module>
> HAVE_UID_ROUTING = HaveUidRouting()
> File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/multinetwork_base.py",
> line 78, in HaveUidRouting
> iproute.IPRoute().UidRangeRule(6, False, 1000, 2000, 100, 10000)
> File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
> line 380, in UidRangeRule
> return self._Rule(version, is_add, RTN_UNICAST, table, nlattr, priority)
> File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
> line 348, in _Rule
> self._SendNlRequest(command, rtmsg)
> File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
> line 318, in _SendNlRequest
> super(IPRoute, self)._SendNlRequest(command, data, flags)
> File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
> line 183, in _SendNlRequest
> self._ExpectAck()
> File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
> line 170, in _ExpectAck
> self._ParseAck(response)
> File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
> line 164, in _ParseAck
> raise IOError(error, os.strerror(-error))
> IOError: [Errno -2] No such file or directory
>
>
> Interestingly, reverting 153380ec4b9b ("fib_rules: Added NLM_F_EXCL
> support to fib_nl_newrule"), does not seem to fix it, and I get the
> same errors.


So bisecting between v4.7 and linus/HEAD with the test above, it seems like:
96c63fa7393d ("net: Add l3mdev rule") is what breaks the tests.

The l3mdev rule patch is a bit tangled with the fib_rules one, but if
I revert both of those, the only thing that fails is the
./neighbour_test.py (which I need to dig further into). But those two
changes seem to be connected to the regression I'm seeing with
Android.

thanks
-john