Re: [PATCH 1/2] Introduce copy_user_handle_tail routine

From: Linus Torvalds
Date: Mon Jul 07 2008 - 12:21:53 EST




On Mon, 7 Jul 2008, Vitaly Mayatskikh wrote:

> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>
> > Now, the stuff that comes *before* that point is the "try to fix up one
> > byte at a time" thing, which I'd like to be simple and dumb. At least to
> > start with.
>
> Just to be clear: do these patches are good enough now (to start with)?
> Or, may be, it needs to be further improved?

I think they are getting there. I'm obviously not merging them in 2.6.26,
but I'd be happy to do so for .27.

Obviously, I'd be even happier if it also went through the normal x86
review cycles (ie Ingo &co), but the current series is largely ack'ed by
me.

> Btw, how much does it cost to CPU to do a fault? Can it be compared with
> average time of find_vma()?

It's *much* higher than a find_vma(). It's on the order of several
thousand cycles, easy (well, it depends on uarch - on a P4, iirc any
exception is soemthing like 1500 cycles *minimum*, and that's just for the
exception overhead, not the actual fault path).

But the thing is, it doesn't even need a find_vma(). We can avoid the
extra trap 99.9% of the time by knowing that the trap happened at a page
crosser (in *theory* a trap can happen in the middle of a page because
another CPU did a munmap() in the middle, but that's not a case we need to
even bother optimize for). In particular, we *know* we shouldn't even try
to cross user pages. So the fixup routine can just do

/* Think about it.. */
#define BYTES_LEFT_IN_PAGE(ptr) \
(unsigned int)((PAGE_SIZE-1) & -(long)(ptr))


/* How much should we try to copy carefully byte-by-byte? */
unsigned int max_copy = remaining;

/* Don't even bother trying to cross a page in user space! */
if (flags & DEST_IS_USERSPACE)
max_copy = min(max_copy, BYTES_LEFT_IN_PAGE(dst));
if (flags & SOURCE_IS_USERSPACE)
max_copy = min(max_copy, BYTES_LEFT_IN_PAGE(src));

/* Do the careful copy */
while (max_copy--) {
unsigned char c;
if (__get_user(c,src))
break;
if (__put_user(c,dst))
break;
src++;
dst++;
remaining--;
}

if (flags & CLEAR_REMAINDER)
memset(dst, 0, remaining);

return remaining;

or similar. Note how this still uses the slow-and-careful byte-at-a-time
approach to the final copy, but it avoids - on purpose - even trying to
copy across page boundaries, and thus will never take a second trap in the
common case.

See? We don't actually care about vma boundaries or anything like that. We
just care about the only boundary that matters for faults: the page
boundary.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/