posible latency issues in seq_read

From: Chris Friesen
Date: Fri Jul 20 2007 - 17:15:50 EST



We've run into an issue (on 2.6.10) where calling "lsof" triggers lost packets on our server. Preempt is disabled, and NAPI is enabled.

It appears that for some reason the networking softirq is not being handled in a timely fashion, which means that the rx ring buffer fills up and packets overflow.

It appears that the problem path is:

seq_read
tcp_seq_next
established_get_next
read_lock/read_unlock

The issue appears to be related to the amount of time that this syscall takes. While we're in the syscall we cannot run the softirqd thread, and so the rx buffer is not being cleaned.

The fact that there are kmalloc(GFP_KERNEL) calls in seq_read() seems to indicate that sleeping is safe, so would it be reasonable to call schedule() periodically (maybe based on elapsed time) to ensure that system latency is kept under control?

Thanks,

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/