Re: [PATCH 1/7] shm-signal: shared-memory signals

From: Ira W. Snyder
Date: Thu Aug 06 2009 - 16:51:19 EST


On Thu, Aug 06, 2009 at 09:11:15AM -0600, Gregory Haskins wrote:
> Hi Arnd,
>
> >>> On 8/6/2009 at 9:56 AM, in message <200908061556.55390.arnd@xxxxxxxx>, Arnd
> Bergmann <arnd@xxxxxxxx> wrote:
> > On Monday 03 August 2009, Gregory Haskins wrote:
> >> shm-signal provides a generic shared-memory based bidirectional
> >> signaling mechanism. It is used in conjunction with an existing
> >> signal transport (such as posix-signals, interrupts, pipes, etc) to
> >> increase the efficiency of the transport since the state information
> >> is directly accessible to both sides of the link. The shared-memory
> >> design provides very cheap access to features such as event-masking
> >> and spurious delivery mititgation, and is useful implementing higher
> >> level shared-memory constructs such as rings.
> >
> > Looks like a very useful feature in general.
>
> Thanks, I was hoping that would be the case.
>
> >
> >> +struct shm_signal_irq {
> >> + __u8 enabled;
> >> + __u8 pending;
> >> + __u8 dirty;
> >> +};
> >
> > Won't this layout cause cache line ping pong? Other schemes I have
> > seen try to separate the bits so that each cache line is written to
> > by only one side.
>
> It could possibly use some optimization in that regard. I generally consider myself an expert at concurrent programming, but this lockless stuff is, um, hard ;) I was going for correctness first.
>
> Long story short, any suggestions on ways to split this up are welcome (particularly now, before the ABI is sealed ;)
>
> > This gets much more interesting if the two sides
> > are on remote ends of an I/O link, e.g. using a nontransparent
> > PCI bridge, where you only want to send stores over the wire, but
> > never fetches or even read-modify-write cycles.
>
> /me head explodes ;)
>

I've actually implemented this idea for virtio. Read the virtio-over-PCI
patches I posted, and you'll see that the entire virtqueue
implementation NEVER uses reads across the PCI bus, only writes. The
slowpath configuration space uses reads, but the virtqueues themselves
are write-only.

Some trivial benchmarking against an earlier driver that did
writes+reads across the PCI bus showed that the write-only driver was
about 2x as fast. (Throughput increased from ~30MB/sec to ~65MB/sec).

I'm sure the write-only design was not the only change responsible for
the speedup, but it was definitely a contributing factor.

Ira
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/