Re: DMA from/to user-space memory

Linus Torvalds (torvalds@transmeta.com)
Thu, 14 May 1998 10:13:14 -0700 (PDT)


[ I'm cc'ing to linux-kernel, because maybe somebody gets the push to
implement this, or maybe somebody wants to shoot holes in it ]

On Thu, 14 May 1998, Robert Kaiser wrote:
>
> For a project I recently did, I had to develop a device driver for
> a frame grabber card. This driver has a requirement to do busmaster
> DMA directly into user-space buffers. Having developed drivers for
> other UNIXes before, I was a bit surprised that Linux didn't already
> provide support for that. I looked through the kernel code and
> found that it was fairly easy to do by (mis-)using the mlock()
> function (more precisely, function do_mlock() in mm/mlock.c).

I feel that misusing the kernel mlock functionality is exactly the wrong
thing to do. It has horrible latency, and in general it just isn't "the
right thing". It may be ok for some particular applications, but it has
lots of down-sides (for example, there is no way for the kernel to handle
restricted DMA memory with this approach - the mlock approach doesn't know
about 16M limits or about need for larger physically contigous areas).

For example, some DMA engines are a _lot_ more efficient if they can have
slightly larger areas in their scatter-gather list: some devices will
generate an interrupt for _each_ entry in the SG list simply because they
are too stupid to do this automatically, so they need a bit of
hand-holding with the interrupt routine pointing them to the next entry.

So a good approach would need to be aware of these issues, and I feel
quesy about using the mlock approach because I suspect that if I ever
accept these patches into the standard kernel, I'll never see anybody try
to do it the way I'd prefer it done.

Anyway, the approach I'd prefer is to have something that is expressly
DMA-specific (you already added three system calls, so lets make those
system calls do something really DMA specific), and instead of allocating
memory and _later_ tell the kernel that you want it for DMA (by that time
it may be too late to sanely fix up issues like 16M and contiguos memory),
you have those system calls set the stuff up the way the DMA code wants
from the very beginning.

So the kind of interface I'd perfer is more akin to something like this:

typedef struct {
unsigned long physaddr;
unsigned long len;
} dma_entry_t;

void * dma_map(void *addr, size_t length,
int prot, int flags,
dma_entry_t * dma_table);

which would work pretty much like mmap() (and if you look at the
declaration for mmap() you'll find that this one looks similar). The
"addr" parameter would be the preferred virtual address you'd like the
mapping on, or NULL if you don't care, while "length" would be the size of
the area, and "prot" would be the same prot as for mmap(). "flags" would
be an extension of the mmap flags: you could have

- MAP_CONTIGUOUS: require it to be _one_ contiguous chunk and return
ENOMEM if none is available.
- MAP_LIMITED: require the memory to be limited to physically below the
16MB mark
- ... any other DMA-specific requirements - this may well be
architecture-dependent ...

And then the "dma_table" would be something that the mapping process fills
in as it does the chunks. So for example, you could ask for a 64kB area,
and if you don't specify MAP_CONTIGUOUS, then the kernel might decide to
give you a "dma_table" that looks like

0x00014000, 16384
0x00102000, 8192
0x001C0000, 8192
0x00140000, 32768

depending on how it actually found memory (so it would try to give you
largish chunks, but it wouldn't guarantee it). The above information,
together with the information on where it is mapped in virtual space
(which is what the dma_map() system call would return) is sufficient for
you to now have full knowledge of what the virtual mapping for that area
is.

So now you could build up your own scatter-gather table any way you'd like
to (which gives you quite a lot of freedom: you might want to include the
same physical range more than once, for example (and yes, those kinds of
things _do_ make sense - imagine graphics-related DMA where you want to
DMA patterns or similar).

Done right, you never need to have any "dma_unmap()", because you can just
use the normal "unmap()" on the region when you're done. Similarly, I
suspect that your "build_sglist()" is unnecessary, because you can do all
the building in user space because you have full information about what's
up.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu