Re: [PATCH 4.1 125/159] net: call rcu_read_lock early in process_backlog

From: Andre Tomt (LKML)
Date: Mon Sep 28 2015 - 22:12:50 EST


On 26. sep. 2015 22:56, Greg Kroah-Hartman wrote:
4.1-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Anastasov <ja@xxxxxx>

[ Upstream commit 2c17d27c36dcce2b6bf689f41a46b9e909877c21 ]

Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:
<snip>

Several of our 4.1.9-rc1 running systems are experiencing hangs requiring hardware/sysrq reset with this patch applied. Reverting it fixes the hangs completely.

4.2 includes this patch as well but I have no such problems there. 4.2.2-rc1 works fine as well.

For now I think this patch should be reverted in 4.1.9.

The hangs have occured so far on Xen PV and KVM x86_64 virtual machines, they will hang completely within minutes or hours depending on the type of workload. The workloads are all fairly light, one running low traffic email/antispam, another running monitoring and metrics of ~5 hosts and one running a single terminal IRC client. All but the IRC one will hang within a few minutes of booting.

When they lock up they only respond to sysrq, with ttyS0/hvc0 not echoing anything typed in back, and are completely dead on the network. One system managed to report rcu stalls but no backtraces (I'll look over the debug config, if there is any interest).

My bare metal desktop has yet to be able to hit it, but it might be entirely down to a different type of workload.

Something missing in 4.1?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/