QUESTION: Why might Linux suddenly stop replying to pings for noapparent reason?
From: Terry Phelps
Date: Wed Jul 18 2012 - 10:42:21 EST
I have this strange recurring problem with SEVERAL machines, all
running the Oracle "Unbreakable Enterprise Kernel", which is based on
the 3.0.16 kernel.
Here is a quick description, while I still have your attention:
I have server S1, and two desktops, D1 and D2, separated by a router.
The D1 and D2 boxes are side by side, on the same IPv4 subnet,
different from S1's subnet. Maybe once a day, or oftener, I find that
D1 cannot ping S1, but D2 can. There are many possible causes for
that, of course, BUT:
I can SSH to S1 from D2, and S1 can ping both D1 and D2 just fine.
TCPDUMP shows that the ICMP request packets from D1 ARE arriving at
S1. S1 is siimply not replying!
>From S1, I can traceroute to D2, but cannot traceroute to D1. A
traceroute to D1 gets an ENETDOWN returned from sendto(). But there is
only one NIC in the S1, and it certainly isn't down!
One more thing: If I enter "ip route flush cache" on S1, the problem
clears up immediately.
And another: If I leave D1 pinging S1 every 5 seconds, say, the
problem will NEVER clear up by itself. But if D1 stops pinging S1 for
a few minutes, it works again!
No, there's no firewall or selinux running on any machine involved,
and no firewall between the boxes.
I'm totally confused. Can anyone suggest what to look at?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/