Regression in 5.3-rc1 and later

From: Chris Clayton
Date: Thu Aug 22 2019 - 09:44:15 EST


Hi everyone,

Firstly, apologies to anyone on the long cc list that turns out not to be particularly interested in the following, but
you were all marked as cc'd in the commit message below.

I've found a problem that isn't present in 5.2 series or 4.19 series kernels, and seems to have arrived in 5.3-rc1. The
problem is that if I suspend (to ram) my laptop, on resume 14 minutes or more after suspending, I have no networking
functionality. If I resume the laptop after 13 minutes or less, networking works fine. I haven't tried to get finer
grained timings between 13 and 14 minutes, but can do if it would help.

ifconfig shows that wlan0 is still up and still has its assigned ip address but, for instance, a ping of any other
device on my network, fails as does pinging, say, kernel.org. I've tried "downing" the network with (/sbin/ifdown) and
unloading the iwlmvm module and then reloading the module and "upping" (/sbin/ifup) the network, but my network is still
unusable. I should add that the problem also manifests if I hibernate the laptop, although my testing of this has been
minimal. I can do more if required.

As I say, the problem first appears in 5.3-rc1, so I've bisected between 5.2.0 and 5.3-rc1 and that concluded with:

[chris:~/kernel/linux]$ git bisect good
7ac8707479886c75f353bfb6a8273f423cfccb23 is the first bad commit
commit 7ac8707479886c75f353bfb6a8273f423cfccb23
Author: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
Date: Fri Jun 21 10:52:49 2019 +0100

x86/vdso: Switch to generic vDSO implementation

The x86 vDSO library requires some adaptations to take advantage of the
newly introduced generic vDSO library.

Introduce the following changes:
- Modification of vdso.c to be compliant with the common vdso datapage
- Use of lib/vdso for gettimeofday

[ tglx: Massaged changelog and cleaned up the function signature formatting ]

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: linux-arch@xxxxxxxxxxxxxxx
Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: linux-mips@xxxxxxxxxxxxxxx
Cc: linux-kselftest@xxxxxxxxxxxxxxx
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Will Deacon <will.deacon@xxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Russell King <linux@xxxxxxxxxxxxxxx>
Cc: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
Cc: Paul Burton <paul.burton@xxxxxxxx>
Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
Cc: Mark Salyzyn <salyzyn@xxxxxxxxxxx>
Cc: Peter Collingbourne <pcc@xxxxxxxxxx>
Cc: Shuah Khan <shuah@xxxxxxxxxx>
Cc: Dmitry Safonov <0x7f454c46@xxxxxxxxx>
Cc: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
Cc: Huw Davies <huw@xxxxxxxxxxxxxxx>
Cc: Shijith Thotton <sthotton@xxxxxxxxxxx>
Cc: Andre Przywara <andre.przywara@xxxxxxx>
Link: https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frascino@xxxxxxx

arch/x86/Kconfig | 3 +
arch/x86/entry/vdso/Makefile | 9 ++
arch/x86/entry/vdso/vclock_gettime.c | 245 ++++---------------------------
arch/x86/entry/vdso/vdsox32.lds.S | 1 +
arch/x86/entry/vsyscall/Makefile | 2 -
arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 -----------
arch/x86/include/asm/pvclock.h | 2 +-
arch/x86/include/asm/vdso/gettimeofday.h | 191 ++++++++++++++++++++++++
arch/x86/include/asm/vdso/vsyscall.h | 44 ++++++
arch/x86/include/asm/vgtod.h | 75 +---------
arch/x86/include/asm/vvar.h | 7 +-
arch/x86/kernel/pvclock.c | 1 +
12 files changed, 284 insertions(+), 379 deletions(-)
delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c
create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h
create mode 100644 arch/x86/include/asm/vdso/vsyscall.h

To confirm my bisection was correct, I did a git checkout of 7ac8707479886c75f353bfb6a8273f423cfccb2. As expected, the
kernel exhibited the problem I've described. However, a kernel built at the immediately preceding (parent?) commit
(bfe801ebe84f42b4666d3f0adde90f504d56e35b) has a working network after a (>= 14minute) suspend/resume cycle.

As the module name implies, I'm using wireless networking. The hardware is detected as "Intel(R) Wireless-AC 9260
160MHz, REV=0x324" by iwlwifi.

I'm more than happy to provide additional diagnostics (but may need a little hand-holding) and to apply diagnostic or
fix patches, but please cc me on any reply as I'm not subscribed to any of the kernel-related mailing lists.

Chris