Re: Accel Hnadling (was Re: Thread-private mappings and graphics)

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Sun, 26 Dec 1999 01:22:51 +0000


James Simmons wrote:
> > Why? I think it would be much better to have user space check the
> > watermarks
>
> Userspace cannot overflow a hardware FIFO. Only the kernel code
> can do that, and you may or may not want the userspace -> kernel FIFO
> behavior to match that of the kernel -> hardware behavior. Userspace
> should wait on a semaphore or something.

Matching user->kernel FIFO behaviour to kernel->hardware FIFO behaviour
looks like needless complexity to me. What have I missed?

> > and flush the buffer as it reaches each water mark.
>
> Userspace should not be telling the kernel when to flush the
> kernel's buffers, only its own. Even if the hardware watermarks must be
> polled (no IRQs), those belong to the kernel.

You're right, userspace should not having anything to do with the
hardware buffer.

"Userspace flushes the buffer" is not a hardware buffer issue at all.

It's analagous to using "sendto" to send a packet over a network
interface. The device driver handles the device buffers. If there is
nothing in the queue, the packet goes right away otherwise it goes at
the next opportunity. Userspace doesn't know the details and it doesn't
need to know.

> > That was just one fault every <insert large block of memory>. You are
> > proposing faults much more often.
>
> Not really, and especially not when the load is high. When you
> are doing something like moving windows in X and the accel load is really
> low, you can afford to fault quite often (relatively speaking) without the
> resultant system overhead being too bad.

Why waste any overhead at all on this?

> In some cases the 4K intel pagesize minimum buffer size may turn out
> to be too large, and most flushes will be explicit from userspace or
> automated via the VSYNC IRQ handler to keep things running smoothly.

Suppose you're transferring 1Mbyte of primitives per frame. That's a
lot of page faults at 4k, so of course you'd want a much larger buffer.

> With Quake3, although you have a huge stream of primitives to pass
> to the kernel, you are going to be producing so many of them per frame at
> such a steady rate that you can drop the number of pagefaults quite
> dramatically and still be able to get a frame's worth of data to the
> kernel. The technique scales very well because of this factor, even on
> SMP.

??? Are you saying that flushing the buffer once per frame is adequate
for Quake3 because it produces a huge rendering load?

In that case it is adequate for lesser 3d apps, and per frame flushing
is trivial from user space, so where is the need for page faults at all?

> > One obvious complaint about using page faults is the overhead of
> > updating mappings. Inter-CPU TLB flushes are particularly expensive.
>
> See above. You don't end up having to do them often enough that
> it becomes a problem.

If you're only using a low fault rate, that sort of thing can be handled
from user space with signals or any other choice of application code.

> Overhead from the Linux fault handling code is an issue, but rawio and
> friends in 2.3 may be able to obviate some of the problem.

Indeed, but so far I've seen /no/ reason why page faults are needed at all!

> Also another note about using page faults to handle accel access. Everyone
> is worried about Inter-CPU TLB flushes. Their was a discussion about this
> as well as fork threads and exec effects on the accel engine. Remember
> OpenGL is designed around a graphics pipeline and each part of the
> pipeline could have different grpahics context state. As you go down the
> pipeline you switch the context state. Whats does this have to do with the
> kernel handling. Well each process, which will be in each part of the
> pipeline, must have a private grapohics context to self and a private
> mapping.

The processes do not need private mappings. There is no difficulty
writing an OpenGL pipeline using threads where all the memory mappings
are shared. The context pointer can live in a "thread specific pointer"
register like %gs on x86, for example.

> This means if a process mmaps a accel resource only that process
> can only have access to it at the time its scheduled to run.

Since the accel FIFO is in main memory which is to be DMAd or otherwise
transferred to the hardware at flush points, it doesn't matter how many
processes have access to it. That is a userspace problem, solved by the
OpenGL library.

There will definitely be threaded applications that want to
use 3d rendering capabilities. Even if OpenGL doesn't use threads
itself, it would be a shame to penalise the performance of those
applications.

> If the process forks and wants to use the accel resource then it must
> mmap it as well. This way it will also have its own private graphics
> context and private mapping. If a child tries to use the parents accel
> resource then it segfault. Also when you context switch the process
> this way you only have to flush only the local TBL.

But why have the private mapping at all?

Consider this architecture:

- When threaded, OpenGL API uses "thread local pointer" to
distinguish contexts.

- Accel resource is mapped into user space using static mmap().
No special fault handler -- this is ordinary memory.

- Userspace fills mapped memory with userspace rendering commands
(command structure to be designed, and may be tuned differently
for some classes of different hardware, with a generic command
set for hardware where it doesn't make any difference).

- Each fill, or group of fills, checks the fill address against
the next flush point or some other criteria. For the
efficiency obsessed, extremely efficient "cmpl %0,%1; jae 0f;
.section .text.outofline,\"ax\"; 0: ..." sequences can be used.

- Each flush from user to kernel space enters the command buffers
onto the kernel->hardware queue. Those commands are then
forwarded synchronoously or asynchronously to the hardware,
whatever is best for the particular device.

- Hardware watermarks influence how the kernel->hardware queue is
transferred to the hardware. (Just like a network device).
User space does not need to know about this.

- Hints, such as optimal command buffer size are available for
user space to make policy decisions. They may be ignored or
not. They may even vary adaptively according to whatever the
hardware watermarks indicate.

See, no page faults, in fact no special kernel support other than a
plain old device driver. 3d library can be used in threads or not as
you please, OpenGL or any other API of your choice. No "must not use
threads" restriction. System is secure because user space never has
direct access to the hardware.

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/