Re: Make pipe data structure be a circular list of pages, rather

From: Ingo Oeser
Date: Fri Jan 14 2005 - 18:36:48 EST


Hi Linus,

Linus Torvalds wrote:
> That's where it gets interesting. Everybody needs buffers, and they are
> generally so easy to implement that there's no point in having much of a
> "buffer library".

But my hardware has the buffers on chip, if connected directly to each other.
So no system level buffering is needed.

> No. Use two totally separate fd's, and make it _cheap_ to move between
> them. That's what "splice()" gives you - basically a very low-cost way to
> move between two uni-directional things. No "memcpy()", because memcpy is
> expensive for large streams of data.

Here we agree already, so lets talk about the rest.

> If you end up reading from a regular file (or writing to one), then yes,
> the buffers end up being picked up from the buffer cache. But that's by no
> means required. The buffers can be just anonymous pages (like the ones a
> regular "write()" to a pipe generates), or they could be DMA pages
> allocated for the data by a device driver. Or they could be the page that
> contains a "skb" from a networking device.
>
> I really think that splice() is what you want, and you call "wire()".

That sounds really close to what I want.

Now imagine this (ACSCII art in monospace font):

[ chip A ] ------(1)------ [ chip B ] [ CPU ] [memory] [disk]
| | | | |
+----------(2)-----------+---(2)-----+----(2)--+--------+

(1) Is a direct path between these chips.
(2) Is the system bus (e.g. PCI)

(1) works without any CPU intervention
(2) needs the buffering scheme you described.

I like to use (1) for obvious reasons, but still allow (2).
User space should decide which one, since it's a policy decision.

I need to open the chips with default to (2), because the driver
doesn't know in advance, that it will be connected to a chip it can talk to
directly. Everything else will be quite ugly.

Please note, that this is a simplified case and we actually have many chips of
A/B and (1) is more like a software programmable "patch field" you
might know from a server rack. I want user space to operate this "patch
field".

Maybe "io router" is a descriptive term. No?

> Yes, I believe that we're talking about the same thing. What you can do in
> my vision is:
>
> - create a pipe for feeding the audio decoder chip. This is just the
> sound driver interface to a pipe, and it's the "device_open()" code I
> gave as an example in my previous email, except going the other way (ie
> the writer is the 'fd', and the driver is the "reader" of the data).
>
> You'd do this with a simple 'fd = open("/dev/xxxx", O_WRONLY)",
> together with some ioctl (if necessary) to set up the actual
> _parameters_ for the piped device.
>
> - do a "splice()" from a file to the pipe. A splice from a regular file
> is really nothing but a page cache lookup, and moving those page cache
> pages to the pipe.
>
> - tell the receiving fd to start processing it (again, this might be an
> ioctl - some devices may need directions on how to interpret the data,
> or it might be implicit in the fact that the pipe got woken up by being
> filled)

Yes, I understand and support your vision. Now I would like to use path (1)
the direct for this. Possible?

Maybe by making drivers optionally aware of splice() and defaulting to a
regular pipe, if they don't care?


Thanks and Regards

Ingo Oeser

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/