SUNRPC problem with 2.6.26 and beyond

From: Harry Edmon
Date: Wed Oct 22 2008 - 11:44:42 EST


I have a dual quad-core Xeon system running software (http://www.unidata.ucar.edu/software/ldm) that relays and processes weather data through RPC calls, keeping a queue of data in a memory mapped file. Up until 2.6.26 the system has run just fine (for example 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs into a problem after approximately 24 hours. The symptom is that the processing slows down to a crawl. Using "top" I can see that the System time is up over 90%, with almost no User and Wait time. If I stop and restart the software, most of the time it gets better - but sometimes it takes a reboot to fix the problem. I have an identical system that does just processing and ingesting data from remote systems, and it does not have this problem. I have tried a number of different kernel configurations, but they all show the same problem.

I suspect a problem with SUNRPC. I notice that there were a large number of SUNRPC patches in 2.6.26. I am looking for suggestions on how to pin down which patches are causing the problem. Are there ways to figure where in the kernel the time is being spent? I am will to work on isolating the problem, but I need some suggestions on the best way to do it given the large number of SUNRPC patches in 2.6.26 and the fact that each experiment takes a day.
--

Dr. Harry Edmon E-MAIL: harry@xxxxxxxxxxxxxxxxxxxx
206-543-0547 harry@xxxxxxxxxxxxxx
Dept of Atmospheric Sciences FAX: 206-543-0308
University of Washington, Box 351640, Seattle, WA 98195-1640

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/