Re: PROBLEM: pthread-safety bug in write(2) on Linux 2.6.x

From: Kyle Moffett
Date: Thu Apr 13 2006 - 06:28:25 EST


On Apr 13, 2006, at 05:56:08, Andrew Morton wrote:
Dan Bonachea <bonachead@xxxxxxxxxxx> wrote:

This problem arose in the parallel runtime system for a scientific language compiler (nearly a million lines of code total - definitely a "real-world" program) - the example code is merely a pared-down demonstration of the problem. In parallel scientific computing, it's very common for many threads to be writing to stdout (usually for monitoring purposes) and it's expected and normal for output from separate threads to be arbitrarily interleaved, but it's *not* ok for output to be lost entirely. This is essentially equivalent to the real-world example you gave of many threads logging to a file.

Interesting - afaik that's the first time this has been hit in a real application.

I would guess that it could also be a problem with a wide variety of Perl CGIs running under Apache2+mod_perl with the worker threading model. In fact this may actually explain some odd behavior I got from one such module that I "fixed" by switching to logging via syslog. I don't remember which module it was or what the exact problem was, but I think it seemed similar in nature. That is somewhat of a surprising and nonintuitive failure mode for logging, IMHO it would be nice to get fixed. I would imagine that this has probably been hit a number of times before, mysteriously fixed by changed userspace locking or subtle thread ordering, and written off as the mysterious effects of cosmic rays on RAM.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/