Re: PROBLEM: relay - stale data copied to user space

From: Tom Zanussi
Date: Thu Mar 19 2009 - 00:25:08 EST


Hi,

On Wed, 2009-03-18 at 16:07 +0100, Martin Peschke wrote:

>
> This is my theory:
> Timing matters. It's a race caused by improper protection of critical
> sections in a producer-consumer scenario. A bug in the bookkeeping
> allows a reader to read at a position that is just being written to.
>

It does look consistent with a reader reading an event that's been
reserved but not yet written, or partially written e.g. if an event
being written on one cpu was read by another before the first one
finished. Can you see if the below patch to blktrace userspace helps?

Or failing that, explicitly using gettid() in place of getpid() in
sched_setaffinity(). Or, failing that, you had mentioned previously
that you would try to reproduce the problem on your laptop - were you
able to do that? If so, it would help in debugging it further...

Tom

diff --git a/blktrace.c b/blktrace.c
index 26b3afd..656ab7a 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -610,7 +610,7 @@ static int lock_on_cpu(int cpu)

CPU_ZERO(&cpu_mask);
CPU_SET(cpu, &cpu_mask);
- if (sched_setaffinity(getpid(), sizeof(cpu_mask), &cpu_mask) < 0)
+ if (sched_setaffinity(0, sizeof(cpu_mask), &cpu_mask) < 0)
return errno;

return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/