[PATCH 3.12 034/142] dccp/tcp: fix routing redirect race

From: Jiri Slaby
Date: Mon Apr 10 2017 - 12:08:12 EST


From: Jon Maxwell <jmaxwell37@xxxxxxxxx>

3.12-stable review patch. If anyone has any objections, please let me know.

===============

[ Upstream commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

#8 [] page_fault at ffffffff8163e648
[exception RIP: __tcp_ack_snd_check+74]
.
.
#9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

224 static bool tcp_in_quickack_mode(struct sock *sk)â
225 {â
226 â const struct inet_connection_sock *icsk = inet_csk(sk);â
227 â const struct dst_entry *dst = __sk_dst_get(sk);â
228 â
229 â return (dst && dst_metric(dst, RTAX_QUICKACK)) ||â
230 â â (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);â
231 }â

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps 267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@xxxxxxxxxx>
Cc: Hannes Sowa <hsowa@xxxxxxxxxx>
Signed-off-by: Jon Maxwell <jmaxwell37@xxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
Signed-off-by: Jiri Slaby <jslaby@xxxxxxx>
---
net/dccp/ipv4.c | 3 ++-
net/dccp/ipv6.c | 8 +++++---
net/ipv4/tcp_ipv4.c | 3 ++-
net/ipv6/tcp_ipv6.c | 8 +++++---
4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 4332b7c25af0..67f0f0652641 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -263,7 +263,8 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)

switch (type) {
case ICMP_REDIRECT:
- dccp_do_redirect(skb, sk);
+ if (!sock_owned_by_user(sk))
+ dccp_do_redirect(skb, sk);
goto out;
case ICMP_SOURCE_QUENCH:
/* Just silently ignore these. */
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 736fdedf9c85..c3ae00de1740 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -132,10 +132,12 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
np = inet6_sk(sk);

if (type == NDISC_REDIRECT) {
- struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
+ if (!sock_owned_by_user(sk)) {
+ struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);

- if (dst)
- dst->ops->redirect(dst, sk, skb);
+ if (dst)
+ dst->ops->redirect(dst, sk, skb);
+ }
goto out;
}

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c67d89ccadf7..129af2aa04d9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -393,7 +393,8 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)

switch (type) {
case ICMP_REDIRECT:
- do_redirect(icmp_skb, sk);
+ if (!sock_owned_by_user(sk))
+ do_redirect(icmp_skb, sk);
goto out;
case ICMP_SOURCE_QUENCH:
/* Just silently ignore these. */
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index e5bafd576a13..7bec37d485d4 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -386,10 +386,12 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
np = inet6_sk(sk);

if (type == NDISC_REDIRECT) {
- struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
+ if (!sock_owned_by_user(sk)) {
+ struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);

- if (dst)
- dst->ops->redirect(dst, sk, skb);
+ if (dst)
+ dst->ops->redirect(dst, sk, skb);
+ }
goto out;
}

--
2.12.2