Providing generic pipelines and io routing as Linux service

From: Ingo Oeser (ingo.oeser@informatik.tu-chemnitz.de)
Date: Fri Jul 18 2003 - 06:42:35 EST

Next message: Geert Uytterhoeven: "Re: [PATCH] PATCH: typo bits"
Previous message: Anders Gustafsson: "Re: 2.6.0-test1 gets corrupted data when loading init"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi there,

I notice, that we repeat the model of pipes over and over again
in many parts of the system.

We have socketpair(), we have pipe(), we have routed audio
channels, which are often a set of interconnected pipes, I needed
to write my own kind of pipe to support a nice user interface for
my DSP subsystem and crypto is going to implement this soon, to
support offloading of crypto tasks, I guess.

So what about a pseudofs and a syscall for providing that kind of
functionality?

I would suggest a syscall sys_generic_pipe() like this

int sys_generic_pipe(char *path, int fds[2]);

path - the path in the pseudofs
fds[2] - a set of fds connected to this pipe

The fds[0] is only suitable for reading, the fds[1] is only
suitable for writing.

This might sound like an disadvantage, but is actually good,
since now you can build a kind of routing with shell commands.

Even this could be another syscall, which I call sys_io_route():

int sys_io_route(int readfd, int writefd, size_t buf_size);

readfd - fd we read our data from (must support read())
writefd - fd we write our data to (must support write())
buffer_size - amount of memory, which should be used to buffer this routing.
              If ZERO than the kernel will negotiate the optimal size
              between the objects associated with readfd and
              writefd.

This syscall is basically what the shell does with "|".

The needed buffer is allocated on behalf this process' resource
limits, if one of the fds is a leaf in the routing graph or if it
must be modified to pass through this hop.

It never allows circles in the routing graph and responds to each
try to introduce such a circle with a ELOOP or EDEADLK.
Note: A circle is also providing two fds to the same kernel object.

If you need circles (e.g. for testing routes), you must emulate
this call on your own, since the kernel cannot help you anyway in
this case.

After calling this syscall, the fds are closed, since the
io-router owns this reference to the associated object now. If
all duplicates of this fd are also closed and the route owns the
only copy, then the kernel internal router is allowed to optimize
the route[1].

If a duplicate exists, then the router will update the usual info
associated with this object, like the file pointer, so you can
watch progress[2].

The route itself as also a file handle, but it supports only
close() and ioctl(). Therefore it is automatically excluded from
being passed to sys_io_route() and automatically cleaned and
reference counted. A route may become broken due to an io error.
In this case the owners are sent a SIGPIPE, which they can react
on or simply die (which removes the route).

But now back to the generic pipelines (now referenced as gen_pipes).

These gen_pipes come in different flavors.
1) Simple data move
   - writer must write always the size the reader expects, which
     can be determined by asking for the block size
   - amount of data readable is the amount of data
     written
   - can be completely skipped by io-router in a zero-copy manner

Example: Atomic pipes

2) Simple data copy
   - writer can write as much as he likes, reader can read,
     what is there.
   - amount of data readable is the amount of data
     written
   - can be replaced by simple buffering in io-router or
     transformed into 1), if io-router is the only owner of both
     fds.

Example: Normal pipes, ptys, circular buffer

3) Synchronous data processor
   - after a write the data is internally modified but the same size
   - amount of data readable is the amount of data
     written
   - cannot be optimized away by io-router, but the same buffer
     can be used for reading and writing

3a) Zero copy synchronous data processor
   - like 3) but you must read as much as you've written, since
     the associated object doesn't buffer
   - this also implies, that you are constrained in write sizes

4) Asynchronous data processor
   - after a write the data is internally modified
   - amount of data readable is different from the amount of data
     written
   - cannot be optimized away by io-router, and a different buffer
     SHOULD be used for reading and writing

4a) Zero copy asynchronous data processor
   - like 4), but you must write in certain sizes and must always
     read as much as there is readable or the data is discarded.
   - cannot be optimized away by io-router, and a different buffer
     MUST be used for reading and writing

With these pipes in place, we seperate the buffering from the
actual processing for devices, which are basically accelerators.

It will also simplify a lot of drivers, which implement buffering
over and over again. We could then use these two mechanisms to
implement a framework similiar to networking for character and
block io and loosen up the destinction between these drivers
types a bit to enable buffered character drivers and zero copy
block drivers.

What is the opinion about this for 2.7.x?

Regards

Ingo Oeser

[1] All pipes, pty-pairs, socket-pairs and any other generic-pipe
user (which doesn't modify the data) along a io-route can be
optimized away, if they are owned only by the io-router.

[2] Now that we have explicit routes, could we add statistics, of
how much was read, written? For regular files we have the f_pos.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Geert Uytterhoeven: "Re: [PATCH] PATCH: typo bits"
Previous message: Anders Gustafsson: "Re: 2.6.0-test1 gets corrupted data when loading init"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Jul 23 2003 - 22:00:33 EST