[Question]many kernel error "neighbour: ndisc_cache: neighbor table overflow!"

From: Jack Wang
Date: Thu Jun 25 2020 - 00:54:21 EST


Hi Folks,

In one of our big cluster, due to capacity increase, more servers are
added to the cluster, and we saw from many pserver reporting error
message below:
"neighbour: ndisc_cache: neighbor table overflow!"

We've tested increasing the gc_thresh values in sysctl.conf, after
reboot, the errors are gone

+# Threshold when garbage collector becomes more aggressive about
+# purging entries. Entries older than 5 seconds will be cleared
+# when over this number. Default: 512
+net.ipv4.neigh.default.gc_thresh2 = 4096
+net.ipv6.neigh.default.gc_thresh2 = 4096
+
+# Maximum number of non-PERMANENT neighbor entries allowed. Increase
+# this when using large numbers of interfaces and when communicating
+# with large numbers of directly-connected peers. Default: 1024
+net.ipv4.neigh.default.gc_thresh3 = 8192
+net.ipv6.neigh.default.gc_thresh3 = 8192

But we still have many systems running in production, so my question
is: is it safe to apply the setting on the fly when servers are
running with busy traffic? or we have to apply the setting only
through sysctl during boot?

Most of our servers with default settings are running kernel 4.14.137~4.14.154

Thanks in advance!

Best regards!

Jack Wang