Re: IO latency - a special case

From: Trenton D. Adams
Date: Sun Apr 05 2009 - 00:17:47 EST


On Sat, Apr 4, 2009 at 7:13 PM, Theodore Tso <tytso@xxxxxxx> wrote:
> Trenton.  Couple of things to try.  First of all, it looks like your
> application is multi-threaded.  That's why it can drive up the load so
> high, even though "ps" and "top" is only showing one process.  Try
> using the -f flag to strace so you can follow all of the processes and
> threads fork()'ed or cloned()'ed from the initial process.
>
> In addition, here's a rather brute-force script that I've used when
> trying to collect data when debugging performance or long-term
> stability problems at customer sites.  Very often it was used on
> production machines where they don't allow random people to poke
> around on it, so this was designed to be given to a sysadmin, who
> would approve running it on their system, and some hours later, we
> would get the tarball, and then try to figure out what the heck was
> going on.
>
> It doesn't have to run out of cron, BTW; it also can be run from a
> command-line, and some of the polling intervals adjusted smaller if
> you need finer-grained resolution, or it can run as a stand-alone
> daemon as well.
>
>                                                - Ted

Hi Ted,

I would imagine it is multi threaded, though I am not positive. I am
asking on IRC right now.

I will look into trying out your tests, and the -f flag. I will
report back once I have more data. The good thing is that once the
problem starts happening, it continues to happen. I think it happens
until reboot, but I'm not positive. I'm about to try again, so we'll
see, as I have not rebooted since last time.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/