Re: mapping user space buffer to kernel address space

From: Linus Torvalds (torvalds@transmeta.com)
Date: Tue Oct 17 2000 - 15:02:01 EST


On Tue, 17 Oct 2000, Andrea Arcangeli wrote:
>
> For example if both threads are reading different part of disk using the same
> buffer that's also a wrong condition that will provide impredictable result (or
> if they're reading the same part of disk why are they doing it twice?).

I'm not talking about users that do things that are obviously
meaningless.

But what about things like:
 - linearized circular buffers (where "linearized" means that the buffer
   is mapped twice or more consecutively virtually in memory, so that the
   user doesn't need to worry about the boundary conditions that normal
   circular buffers have)
 - multiple buffers per page.

And note in particular that you may not be able to page-align everything:
and it can be extremely costly (both in memory use and in cache footprint)
to always pad out your buffers etc.

> But in map_user_kiobuf we can't avoid to play with the virtual memory,
> as we can't avoid to do that in the remap_page_range case.

map_user_kiobuf() isn't "playing" with the virtual memory.

It's a simple lookup function, looking up the physical pages that
correspond to a specific range of virtual memory.

Nothing more.

> In fact the reason I don't like to put VM stuff into rawio is that I like
> the clean design that you described:
>
> o lookup the physical page
> o do I/O on the physical page

If you like it, why do you want to break it?

You _say_ that you like it, but yet you want to add extra conditions like

 o while I/O is pending the virtual mapping is fixed

Which basically means that you suddenly have lots of interactions with the
VM layer - even though everybody agrees that is a bad thing.

Your solution would be that you'd mark the _virtual_ page dirty, then lock
it so that the dirty bit cannot go away (and the page cannot be unmapped),
and then keep it locked until the IO is complete, and basically have all
dirty handling based on virtual addresses. Even though those virtual
addresses have _nothing_ to do with the IO itself, and you say that you
want to do IO on the physical page.

In contrast, I'm saying that direct IO that reads into a (physical) page
should obviously mark that (physical) page dirty. In the meantime, others
may choose to do IO to the same page. They could even use the same buffer,
because we don't lock things: this has extremely well-defined meaning in
the case of "write this page to these 100 devices asynchronously".

How would you suggest we handle the "write to 100 clients simulataneously"
issue?

Are you suggesting something like: if it is reading from a page (ie
writing the contents of that page somewhere else), we don't lock it, but
if it is writing to a page, we lock it so that the dirty bit won't get
lost.

Sure, that works (modulo the fact that it still has the issues with
serializing mmap's and accesses to other areas in the same page). But do
you really claim that it's the clean solution?

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 21:00:12 EST