Re: PROBLEM: relay - stale data copied to user space

From: Martin Peschke
Date: Thu Mar 19 2009 - 13:50:52 EST



On Wed, 2009-03-18 at 23:19 -0500, Tom Zanussi wrote:
> On Wed, 2009-03-18 at 16:07 +0100, Martin Peschke wrote
> > This is my theory:
> > Timing matters. It's a race caused by improper protection of critical
> > sections in a producer-consumer scenario. A bug in the bookkeeping
> > allows a reader to read at a position that is just being written to.
> >
>
> It does look consistent with a reader reading an event that's been
> reserved but not yet written, or partially written e.g. if an event
> being written on one cpu was read by another before the first one
> finished.

So this is part of relay's design, and it's up to user space to make
sure that reader and writer are on the same CPU?

> Can you see if the below patch to blktrace userspace helps?

It appears to fix it. I will give it more testing in a larger
environment.

> Or failing that, explicitly using gettid() in place of getpid() in
> sched_setaffinity(). Or, failing that, you had mentioned previously
> that you would try to reproduce the problem on your laptop - were you
> able to do that? If so, it would help in debugging it further...

This didn't work out. But then, it's a single-CPU machine.

Thanks,
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/